ED 1^1 «14 



\DOCfraENT BESDHE 

s ^95 



TH 006 943 



^ AUTHOR 
TITLE ' " 

\ SPONS AGENCY 

BOREAU NO 
POB PATE 
GRANT 
NOTE^ 

EDRS PRICE 
DESCRIPTORS 



Washingtop^ D..C, 



^Subkoviak^ Michael J. . . _ _ 

, * Ev.aluatioa of Criterion-Ee.ferenced Reiiabijity 
Coefficients. Pinal Report. 
National Inst." of Educatibn'^DHEH) 
* Basic 'Skills * Group. ^ 
' , 6-0357 - , . 

3^ Dec 77 / ^ ' ' ' r ^ 

. NIE-G-76-d688 - * . ^ 

113p. . ^ . ^ ' ' ' ' ^ . 

" — ' > . 

MF-$0.85'-»C-$6.01'Plus Postage. 

Criterion Referenced Tests; *Cutting^ Scores; *Data 
' , ^ Analysis;^ Data Collection; *Mastery Tests* 

. * *Mathematical Models; Research pesign,; Research , 
^ ' Methodology; "Sampling^ Statistical Analysis*- \ 

. ^ ^ Statistical Data; *Test Reliability . ^ ^ ' ^ ' 

IDENTIFIERS ' Test Lenqth 

^ ■ % ^ , . 

J ABSTRACT ^ ' ' \. ' ' 

J Fout diffetent procedures were used for estimating 

th^e proportion of persons vhp would be classified consistently as 
either passing both of two parallel tests or failing both. These four 

. methods were 'applied at ea-ch of four different"^ mastery^, level scopes 
for each ^of tl^r.ee different leng-th tests. Data were based on 50 • 
replications of ^ac'h procedure for samples of 30 cases^aiid 300 cases 

, randomly drawtf from -a population of 1586 cases. The outcomes of these 
sameiiQ^ experiments were com^pa^red with the parameter valuiBS for tjxe . 
p<5PQlation. In addition ±o this^ study, four other papers on related 
topics are attached.. "Empirical Investigation^ of Procedures for 
Estimating^Jgsliability for Mastery Tests"^ discusses the data 
presented ,in th^^ previous paper. .The next' paper, "Estimating the 
Probability .of Correct Classification in Mastery Test ing,"- discusses T 

'the KeaUs-Lord- model used in the- preceding papers. "P-urther Comments . 
on Reliability for - Master y .Tests" explores the use of . coef ticients^ of 
class if. icatioji consistency. The last\ paper, "Co'nf irlb^-tory. Iriference 
and Geometric Models," discusses the relationship "between exploratory 
and -confirmatory approaches to collecting ^Oiid analyzing data aftd the ■ 
application of geometric models to these approaches. (CTM) 




*****4i****:4e ****3|t**:ft** ******** 

* Reproductions supplied. by EDRS arW^he best that *cari< be made 

* - ' ' * • from the orji^ginal document. 

****:4e*:4e*:|c4N ******* ******** 



ERIC 



T— I 



us OEPARTMENTOFHEALTH. 
EDUCATION & WELFARE 
NATIONAL INSTITUTE OF 
- EDUCATION 

THrS DOCUMENT MAS 8tEN RE^RO- 
OUCEO EXACTLY AS RECEIVED FROM 
THE PEfiSON OR ORGANIZATION ORIGIN- 
ATING IT POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENT OF F IC I AL NATION AL I NST I TUTE OF 
EDUCATION POSITION* OR POLICY 




FINAL REPORT ' 

PROJECT NO. 6-0357/"-. 
GRANT NO. NIE-S-76-0'088 ».>.^ 



EVALUATION OF CRITERION -REFERENCED 
RELIABILITY COEFFICIENTS • 



. MidiAEL J. SUBKOVIAK 

UNIVERSITY OF WISCONSIN 
MADISON, WISCONSIN 53706 



December 31, 1977 



"CO 

i 

CO 



The r;egearch reported herein was performed |)ursuant to a grant vdth*the Natioml 
Institute of Education, U.S. Department of Health, Education and Welfare.' Con- 
tractprs undertaking such projects under Government sponsorship are encouraged 
to fr-eely express their professional Judgment in the- conduct of the project. - 
Points of yiqW or opinions stated do jiot, "therefore, necessarily represent, 
official National Institute of Education position or policy.- 



■ ' U.S. DEPARBIENT OF 
' _ ' heal™, EDUCATION' and WELFARE^ 
NATIONAL INSTITUTE OF "EDUCATION 
BASIC SKILlS . " i 



ERIC 



Abstract' 



Four different'procedires (Huynh, .1976; Marshall § Haefl^el, 1976; Subko-. 
viafk, 1976; Swaminathan, Hambleton 5 Algina, 1974) have been proposed for. es; 
timating. the proportion of persons consistently classified as master/mas.ter or* 
' nonmaster/nonmaster on two mastery t'ests. Estimates of this proportion were 
obtained by each of the above procedures for 'repeated samples of 30 and 30C 
persons drawn from a population of 1586. These estimates were then comp^'ed 
for accuracy to the proportion of the population of 1586 consistentl^^l^ssi- 
fied as master/master or nonmaster/normfaster on two tests; hereaflJ^r, this' 
proportion is, referred to as -the 'population, parameter, ReasonaEay .accurate ^» 
estimates of the'population parameter were generally obtaiii^for <all four pro- 
cedures;^Jiowever, instances of ^yst^tic estimation bias^ere obsef\f$d, es- 
pecia^y for tests of 10 itfem^ df^less. For example^^^e Ii^aynh procedure 
tended to produce underestimates of the population^rameter, v^lle the Marshall 
Haertel and Subkoviak procedures produced unUere^limates in pertain iijstances 
and overestimates in others., TTie'^waminathan-Hambleton-^^ procedure gen- 
erally produced unbiased estimates, howev^r^^e'se estiinates tended to deviate* 
widely from the population ijajr&ne^er, esn^iallyNfor sainples of 30 persons or 
less. 




EVALUATION -tlF " CRITERION-REFERENCED 



• R^LIABILlJ^f COEFFICIENTS 



O 



MICHAEL J. SUBKOVIAK ' 
'-UNIVERSITY OF WISCONSIN 



/ • / 



ERIC 



4 



/ 



• • '• //■'■' ■ ■ ' . • ^ 

. ' Introduction\ 

For present purpos/^A mastery test can' be*de£ined Vs a test,with a 
single putting scor^'^cy that d^t^rmines mastery and nonmastery subgroups, having' 

. ..scores "above and belo\/c respectively. In' this context; reliability -refers to 
tfte consistency of mastery-nonmastery decisions over repeated test'administra- 

. tions (H^bleton yNovick, 1973, pp. 166-167). Accordingly, the proportion • 
of consistent ma/tery/nfestery and norapa^tet^nonmastery classifications for ■ , 
a group on tw/tests with cutting score c, sympol-ized P haT Jeen proposed '• 

^ as a raw ind0: of reliability in this cont^_gBwaminathan, Hambleton 5 Algina, 

1974, 1975).- In addition,' three procMy»^ for estimating proportion P- from 

scores on a s jungle test have' emerged .(kj5mh, 1976 ;iMar shall 5 Haertel, 1976: 

Subkt)yiak, 1976). Thus, a teachpr' or other test'user is faced with the prob- 

len(i-of chopsing among four different procedures for'estimating P in the ab- 
• . ' , ^^ ■ S ' 

. sence of any clear guidelines. The purpose of the- project was to provide such' 
/ 4» . ■ -< 

, guidelines by comparing population values of P to sample- estimates of P 
"'■ • .>/. -" • ' * * ^ ' : ^ • ~ 

Sp^cifically,^e proportion of consistent classifications on two tests for a = ^ 

population of 1586, a P^ - parameter, was compared for accuracy to four dif- • , 

ferent P^ -'^timates (Huynh,^1976; Marshall" § Haertel, 1976; Subkoviak, 197^; 

Swaminathan et al., 1974) for "repeated samples of- 30 and 300 persons from the* 

I same population. The" results thus illustrate the ext-ent and *nature,^of discrep- 

ancies between parameter, aod estimates. ' the results sal so have indirect rele- . 

. ■ vance for the process of estimating coefficient kappa, a function of 'p^ that.' 

has also h^een proposed as an index of reliability for mastery tests (see Huynh, * . 

1976; Subkoviak, 1977; Swaminathan et al.-, 1974). , ■ ' , 



ERIC 



\ 

* * / * 

'^c> ... / 

*. ''.Mat^od 



- jr"- 



^he data ba^se for the proiect consisted o£ the responsejs b£ a population ^ 



d£-15^6 Students to parallel tests of IQ, 30, and 50 items each from the Scho^ 



lastic Aptitude Test- ; Each 10-item test wa^ part of the longer 30 item test ' 
■which in tup was part of /th/e 50 item test. Half of -the-i terns on^each test 
were reading comprehension;- and the^ other half were a mixture of analogy, an'- 
tonym, ,and sentence completion items. The means, standard deviations, and.KRZO 
reliabilities of the various tests *are shown in Table 

Thus, the, distributions of .scoi»es foy^a population af 1^86 students on 1 
parallel tests of n = 10^ 30, and 50^ items were available. 'For each of these 
n-it^^ttests, four different: -mastery criteria wei^onsidered: c = '50%, 60%, 
701-, and 80% items correct: Fdr each combi^tion of the 3 test lengths (n) 
and the 4 mastery criteria fc), the proportion of the 1586 students conSis- * 
tentl/ classified as master/master or noimaster/noimaster on parallel tests of 
length n was coniputed, for tpta'X o£'3*x4=12 values of parameter P . These 
parameter valued appear 'in the 12 cell^ of ^a^bles 2-S^in Appendix^ A. ' . ^ 

For example, in Table 2a when n = 10 dl^eitis "and the mastery criterion ' 13- ^ 
set at c ^^5.01 (or 5 items correct/, 67%' of the 1586 students were consistently 
classified 'tin two IQ-.item p^all^l tests, i.e.,, the parameter value' is P = .67, 

This parameter value of .67* similar 1/ appears/in all .the. tables of Appendix A, 

. ^ I . * ' . ♦ 

as do 'the- parameter values for the xOther cases considered. 

The other numbers in the first cell of Table 2a in Appendix A, are respec- 
tivply the J^an ,and the standard error of 50 Swaminath^-Hamlpleton-Algina es- 
timates of parameter .value P^ = .67, bas^d on 50 random samples of 3tf. students . ' 

^ -*» *^ " / ' ' • . * » . . • . • * 

'frcm the population, of 1586, students. For th^ case of n =ab items and C = 50% c 



Gary. Marco of Educational Testing §ervice*pr6vided /fhe data used in this. study, 
- and Barbara A^brecht^and Carl. Voelz pf the University of Wisconsin helped with 
the analysis. . the a§sistance qf each is gratefully acknowledgiid. • , 



\ 



/ 









^ 

< 






* 








• * 
















1' 




Tablev 1 






1 




Test Statistics^ 






t 


^\ 

Statistic 


Test 
^ Form 




> 

lest Lengtn 








30 


50 • 




Mean 




4.87 ' 
4.6^ 


14.49 : 

1^ 18 


^ 24.11 


1- ^ . 


^ \ 
Standard \ 


,1 


2.oa 


5.45 


•8.43 • 




Deviation ^ 


2 • 


' 2.07 


.4.87 ■ 


7.82? ,^ 


KR20 ' 


1 




.8i . 


- .87. . 




' Reliabiiity 


2 ' 


.56 . 


■ .77 


.86. 



^sed 



=1 ^ 



\ 



r-.. 



0X1- a populatidii of '1586^erso'fis 



..ERLC 



7 



correct in Table 2a, the mean of 50 Swaminathan-Hambleton-Algina estiinates-ws 

, ,:68j and the. standard error 6f thfese es-tilnates was '.08, ,i.e. ,\he estimates * 

-tended to deviate^ from the parameter value* of -.67 by .08* units ou the average,. 
\ - " ( • * - ♦ ' . ■ . . ■ 

' The first cell of* Table_ib contain^ the same type o:^ inforlhation fbr Swaminathan- 

Hambleton-Algina estimates' based on 50 j-andom samples of 300, persons frorti the ' 

^ population of 1586. Remaining Tables 3-5 "of 'Appendix A. provide^- silnilar in- 

^ formation for th^ other estimatioK procedures consider|4^n the study: Marshall - 

Haert^l, siibKoviak, ai^^^ynh.^ ' ^ ■ ' \ 

, . Results. • ■ • • . ^ 

■ 5^ " • ' . . ' ^ ' .\ 
Swammathan-Hamblfetpn-Algina Procedure ^ \ 

The_Swaminathan et al. (1974, 19^) -rdidability estimate is sijnpiy the * 

] proportion of persons iji a -sampre coiiistentiy classified as master/master or 
^ • . « . N ■ • ■ . „ ■ ' ■^ 

. nonma-st^/nonraaster on two tests. As described above', this estfrriate .was com- 

puted repeatedly for 50 samples of 30 an5 for .50 samples of 300 from a popula- 
- tion of 1586 persons. Thp means and standard errors of thes^ , estimates, for- 
' . various test lengths and mastery criteria, are shown in Tables 2a andv^b of 
. - • Appendix A. ^ . ^ ' _ ■ * ' 

^ - Figure la is a graphic representation of Table 2a for sample's of 30 per- 
sons, while Figure lb 'is a graphic representation of Table 2b for samples of - r 
300. Iiy.the figures .^rameter values are .represented by o's; estimate means 
are represented by x's; and standard eiTors .^re represented by line intervals 

f < - ' " 

xC-) , indicating the extent- to wh^h estimates tend tb deviate from the parameter 
value . 



' < * 
^ In Figures la and lb; estimate means (x) generally equal corresponding pa- 

;rd5neter values (0), which suggests that Swaminathan et al. estimates are unbiased. 



^ALV computations were_done via con^juter ptograms written and tested specifically 
•for-thti project. The interim report of April 30, 1977 describes .that phase 
of- the project* . • \ • . • 



1.00 



.90 



' .80 

- 

£ - 

' (X .70 



.60 



..SO 



10 I 



30 ITEMS" 



S0% 60% 70% 80% 



X 



0 PARAN^R , 
X MEAN. ESTIMATE 
^ ST. ERROR 



-J-*.. 



T 



50% 60?f • 70%^ 80% 
MASTERY CRITERION 



50 ITEMS' 



I 



50% 60% ^70% 80% 



i 



' 



Figure la. Means and Standard Errors of Swaminathon-Hambleton- ^ 
Algina Estimate's for Repeated Sairples of 30 Persoris 



V 



10 ITEMS 



1.00 



>.90 



SO - 



Wo. 60% 70% 



If: 



0 P/ 



X MEAN.iEl, 



•ST. ERROF 



\TE 



50 ITEMS 




80% 501 60%// 770.% 80%,. 50% 60% "76%r 80% 



Figure lb. Means and Standard ferror/of Sv/aminathon^Hambleton- 
)|lgina Estijnates; ^(jr Repeated Sanples^of 300 Persons 

\ 

\ 



as might -be. expected for this two-test procedure. • ' - 

However, for classroom size samples of 30 persons' or less , the relatively 
. , _ - -.— J- . > 

large stanjiard errors of Figure la ^suggest ,that Swaminathan etal. estimates 

1 . - ^ * ' . - 

tend to fluctuate about the parameter valiie to a greater extent than Marshall- • 

f-Ia^ertel, Subkdviak, or Ffuynh estimates> (compare the standard etroivs of Figures 

2a, 3a and 4a) . , Of course, as in figure lb, thk standard error of estimate 

can be reduced by* increasing sample size to, -say,. 300 persons, which is* not- • 

^ ' ' ' ^ ■ • ' " / 

unreasonable for a test publisher or other largerscale user. *• ' / 

It might also be noted in Figures la,b<)r Tgtbles^2ij^b"that the. standard 

error generally tefids to decrease as test length (n) increases and as the 

^mastery criterion (e) increases. These observations are attributable, at 

least in part, to the fact that as the parametric, value of\ a proportion .(P^) 



becomes more extrejne (large or^ small) estiinates of that prpportiqn tend to^be 
more accurate or less variable. Th^se same trends ^re ijepeated in subsequent 
figures and tables. * v r i ' ^ • 

It might ^also be mentioned at thi^- point that'^the reliability e'^timates 
of all subsecjuent figures and tables are bas^d on a single . administration of 
test' Form 1 to samples^pf 30 and 300\ students. This represents a distinct 
advantage over estimation procedures requiring two test forms or administrations^ 

• ■ ' '\ ' . ' • ^ >: / . 

Marshall -Haer tel Procedure ' - ^ ' 

— ""-^ ' ' , - - • ' r 

• ^Procedures that estimate reliability from a single test administration 

• ^ * ' ' ' ' ' 

^fenerally substitute certain assumptions for tRe missing or absent second < 

' ' • • ' ' 

testing. 'For exa mple, the M arshall -^Haertel procedure makes th^ hypothetical 

_ ' 

assumption th^t if. n-item tests* were repeatedly' administered to an individual 
student, his or her distribution bf observed scores would be binomial, with 
parameters n (nmber of itei|is) and p (probability of a. correct item' response). 



^Estimates' based on Form 2 of each test were very similar to/ those based on 
Fotm 1, as indicated in the interim ^project report of April 30/1977^ 



'Marshall and Haertel use^ each, -student ^5.. observed proportion correct score^on 
* •'«<•• • * 

the actual n-ifeni t-e^.t to approxiinate his or her binomial p parameter,' and the 

group distribufion of observed scores on a ^hypothetical 2n-item,test is sijnu- 

lated. - This 2n-4tem test is split into half tests in a*ll possiH^^ys; and ^ 

an estimate of P , the proportion of consist^t classification/on two testsV' ' 
• - ' ' \ • . ^ _ 

.is 'computq^'for eacfi split. • The mean of these various split-half estimates 'is'. . 

then-taken as the final estimate of P^. See Marshall and' Haerter^^l9^76) for 

fujjher details. ' ' , >, . • * . . . • * • - 

^ Figua:*es 2a and '2b are graphic repres^tations of Tables 3a and 3b (Appendix 

A) for samples of 30 and 300 respectively. Especially for tests' of fo items or 

less, th<?re. appears to b^^slight systematic bias in the Marshall -Haertel es*- 

timates of .Figurers 2a, b. ,Tlie estimate means (x's) for mastQry eriterik of ^SOI* 

and* 60%, vyifich aje;poi;its ftear-f^e center of the uhiinodal test 'score- distribu- 

,tion used in the atuc^, tend to .overestimate the ^)arameter (o's).^ Conversely, 

estimate means X^'s) for mastery criteria of 701 of 80%, which are.poin.ts in 

the tails of the distribution, ^tend' to Slightly underestimate the parameter - 

(ols).' Algina anU Noe (1977, p.,' 6) report the same type of bias in a somev>rtiat 

different context and reXate' it to the use of students* observed proportion 

correct '3CoVes as approximations of the 'binomial p *- parameter.-. The magnitude 

pf such bias\^ should decrease as test length increases, as it does in Figures* 2a, b 

for tests of 30 and 50 items; since' observed proportions provide better appr^i- 

mations to the binomial p - parameter as the number of items pr trails increases. 

^ubkawiak -Brotedure ^ ^ - . - ' * ' 

This procedure asstimes that if n-item tests were repeatedly administered* 

to an individual, his or her distribution of observed scores ivould be compound ■ ' 

^binomial and thajt "^^^^^^^^^^[^^ score on one test does hat effect his or 

her scores on the other tests'. ^ Individuals* •observed proportion correct- scores 



9\ 



10 ITCMS 



30 I'TEMS 



50 -ITEMS 

4 ^ 



1.00 



,90 



.80 



i .70 



4 



.60 



.5a|-« - 

■y 

K 



-1 



t 



^r-^^^^ 

0 PARAMETER 
X -MEAN estimate'- 
J[ .1- ■ ST. ERROR ' -• . 



If 



50% 601 '70% 80% 50%^ 60%" 70% 8b%"~ 50% 60% 7oV 80%_ 
, MASTERY CRITERION^ ' . - 




Figure 2a. Means and Standard Errors o'£ Marshall -Haer tel. 

Estimates for Repeated Samples qJ^ 30 Persons' 



r 



ERIC 



V 



1^ 



r.oo' 



.90 



y y 



.80 



a, .70 



■30 ITEMS 



50 ITEMS 



f 



+ 



■ i 

I 



.60 



JL_s L 



0 , PAiRAMHTER 
X MEAN ESTINiATE 
- ST. -ERROR 



/ 



50?5- '60% 70S 801 



50% 60S 70% \ 80% 
MASTERY CRITERION . 



.50% 60% 70% 80% 



Figure Zb;' Means and Standard Errors of Nfershall -Haer^el 
Estimates" for Repeated Sanples* of 300 Persons 




14. 




^^\l^n a test and the 'associated KR-20' coefficient are" then used to obtain.^near ■ 
'.. '-regression approxijnations of individuals' -compound birfomial p - parameters A 
^roup.es;timate-of P^, -die proportion of (fohsistent clkssif ications ' on two tests, 
.can then be obtained from the individual compound binomials distributions. See 
-Subkoviak (1976) for details. ■ " • 



•Figures '3£i,b' are graphic representations of Tables 4a, b of Appendix A. 
^ Algina and Noe (1977, p. 6) report sligh^t, systematic bias in Sybkdviak esti- 



mates for simulated d|ta. >They found parameter estimates to be4'oo small for 

ion 



-too 

mastery-criteria near the center of' a unimodal test score distribution and too 

\^ . ' .. ^ 

large for mastji-ry criteria in tKe~ tails of the- distribution. This trend also 

. J> se^s to be present in the l^^item test pf Figures 3a, b; however, thd eVddence 

. of such a trend 'iii the 30 o/ so item" tests otFigures 3a, b is somewhat less 

- ^ J ' ' 

compelling. ^ ] ^ — ^ 



IVhile the Subkoviak projcedure precedes reasonably acgilrate parameter es'- ^ . 
timates for t{ie unimodal test score distributions considered herein, as evidencej^'^t'l'- 
by the relatively small .standard errors in Figures 3a, b, grossly inaccurate esT'"''J%^ • 
timates can^occur if liriear- regression approximation^ o£ biiTomial parameter 
are blindly obtained for multimodal data sets (see Huynh, 1977, Counterexample 
2; Subkpviak, 1976, p. 269). While more complex regression techniques. could 



*4.'.. 



be eiijjgloyed to approximate p in such cases > the procedure discussed next pro-- i 
, vide's a more tractabl.e solution for mo st data sets). likely to arise in prac.tice,.. ^ 
e.g., unimodal, bi'modal, or uniform". • • . , ' 

^ ' Huynh Procedure . • ' • * • 

' . ^ Basically, t^iis procedure assumes that the distribution of observed scores 
( . .over repea-ted testing of an individual's binAnial wi^h parameters n and p and that 
*. such test outcomes are independent of one'another. In addition, the 'distribution ' 
- of individuals ' binomial p - parameters is assumed to be .beta in form fsee Lavalle, 



V 

1 



ERIC ^ ■ ' . 



12 



< 10 in^Ms 



1.00. 



.90 



^.80 

c 

Q 

a. .=70 



I 



1^ 



6 



30 ITEMS 



i 



50 ITEMS 



f 



.60 



50 - 



_L L 



I 

o' PARAM^R^ 
X MEAN^ ESTIMATE 
- ST. ERROR 



-1 -.JL. 



SQi'o -60r 701 80% 



J , L 



50% 601 . 70% 
MASTERY CRITERION 



50% 60% 70% 



80% 



Figure 3a." Means and Standard Errors of Subkoviak Estimates 
for Repeated. Samples. Gf-30 Persons^ 



ERJC 



:16' . 



1.00' 

/ , 



.90 



.80 



1.70 



1 



',- .60 



" .50- 



\ --I 
10 ITEMS""'. 



30 ITEMS 



1 




0 PARAMBTER 

X MEAN ESTIMATE^ 

-/ST. ERROR : ^ 



f 



50. ITEMS 




-J lJ_ 



50%-":: 60% -701 vgQl' .501 6a% 70? 80% 50i 60% -70% 80% 

; • MASTERY CRITERION 



Figure 3b. : Meaijs and Standard Errors of/Subkoviak Estimates ^ 
. .\ior Repeated Sanples of 300 Persons • - ' ' ' 



J' 



17 



•1970, p. 256 for examples). Un der. t hese assumptions, the bivariate distribution 

' ( * ■ ) • ' / . . 

of observed scores on two testings .for the 'group is beta-binoMal in fom'and' 

• ' I ' \- 

can be simulated fron scores -on a single test admi4^is.tra;Ux)n. Est/matSs of, \ • 

parameter aan thgii be obtained from this simulated Beta -binomial distHbutiori. 



See Huynh (197^) for further explication. ' ^ > ' , ' ^ - . 

Figures .4a, b are graphic representations of Tables 5a, b (Appendix. A) . 'The 
. ■ J ^ y • 

trend m. Figures 4a, b would appear to be tovrard 'conservative e^imation for 

% . • . r ' — ■ 

tests of 10 items' 0A^1 ess. .However, for tests of 30 or«50 itiis" the .Huynh es- ' 
timates -are generally qx^itq good as evidenced by the coincidence of means- (x's) 
and paraife^s (o's) as well as the small standard errors ofi^Figures 4a,b.' 

, . ■ . Conclusions ^ • . . - 

" , ^ ' J. ' 
All four procedures (.Huynh',' 1976 ; ^Marshall Haerte j /l 976; Subko vj.a}c. 



^1976; S\>miinathanl al., 1974) appear to provide reasonably accurate ;.es timates 



of par^l^^ter/P^, the proportion 1>JF consistent classifications on two riiastery 
tests, for the various cases osmsiderq^d herein. -In partijbular, the Huynh pro- 
cedur^ seems especially tractable. The following spej::ific conclusions also ^ 
appear to be suppqrted b/ Figures J -4.. . }? 



1. 'The two -test Swaminathan-Hambleton-Algina procedure produces Wbiased es- 

timates. However, the* standard error of these estimates is relatively 
large /for classroom size /samples of 3Q or fewer persons, ^For samples cjf 
300 or more, the standard errors are quit6 smalls " 

2. Pot unimodai distributions of scores on tests of 10 it6ms or less-, ^the 
^ one-test Marshal i -Haertfel , procedure produces ^ovJ-estiftiates for maHery. 

criteria near the center, of the df stributio!i ahci 
teria iti *the tails. The^ standard^ error of Marst 



underestimates for cri- 



a-Haertel estimates is 



'relatively small for classroom isiz^^inples ofw' or more 



!15 



10 ITEMS 



1.00-1- 



30 ITEMS 




t 



• 0 PARAMETER 
X MEAN ESTIMATE 
■■^^ ST. ERROR 



50 ITEMS. 



.1. 



V 



50?6 - 60.%- 701 80% ^501 ; 601 70% 80% 50% 60%-, 70^% 8'0% 

i'^.- fdASTERY CRITERION ^ / . . 



AjFijgure 4a. Means and Standard Errors of ^ftiynh Estimates for 
Repeated Sanq^lesvof 30^Persons ^- ' 



\ 



l.OQ' 



.90 



10. ITEMS 



30 ITEMS 



-50. ITEMS 



I. 



S 



.80 



.70 



4 



f 



4 



i 



.60 



,50 - 



0 PARAMETER 
X MEAN ESTIMATE 
- ST. ERROR 



: Li 




50% 60% 70i 80|- ^50%^ 60^% 70%' 80% 50% - 60% ,70% 

MASTERY CRITERION . - 



80% 



Figure 4b. Means md Standard .Error^:<}£ Ei>7ih Estimates for 
Repeated Samples of 300 Persons ] 



3. For unimodal, distributions' of scores on tests o£ 10 itans or less, the one- 
test Subkbviak procedure produces: imderestimates for mastery' criteria neai^ 
the center ofthe distribution and overestimates, for criteria in the tails , 
. The standard error!" '(^ Subkoviajk estimates, is, relatively '.small for sanies 
b£ 30 or more persons. ' • * . • ^ ^ . 

.4. ^For unimodal distributions of scores on tests 'of 10 items 'or less, the 'one- 
test Iluynh procedure produces underestimates^ The standard errors of 
Ftoynh estimates are rei|ftively small for Sa'mples of 30 or more persons, 

- It might also be added that the Huynh procedure appears to^have fhe mosr~sound 
mathematical basis of the' three one-tes^ approaches . The two-test Swamihathan- 

Hambletdi|-Algina procedure is also quite tractal^le in this sense. " ' ' . 

' .S' •• 

*\ ■ ' n - . 
— ' ' — ■ ■ Related Research \ 

_ The foregoing represents 'jthe wotk contracted for in th,e original project 

proposal. In addition,»a jnumber of .papers on related topics were completed 

and Submitted tor publication du^ng- the grant, fi^riod . Titles jnd a'bstracts 

of these papers follow. Copies of these papers are also. included in Appendix 

•B. . . ^ ■ • 

Subkovia^ M. J. Empirical investigation ^of procedures f6i^ estimatingwrelia- 
bility for mastery tjests. Manuscript submitted for publication, ,1977. 

' ' ' 

Abstract 

• . • ^ • • . - • ' ' < - 

Four cliffereijt. procedures' (Ifu^, 1976;' Marshall ^-Haertel, 1976;^Stib- 

* - ^ \ ' " ; ■' „ " 

Ioviak, 1976; Swajnina than, Flambleton § Algina, 1974) haVe been proposed for - 
- ' ■ . ' . - ~^ *■ ■ 

stimatirtg the# proportion of persons corisistentLy^classified as -master/master ■ 

^ ' . ' z^- 

or nonmaster/nonmaster on two mastery tests. Estimates' of ^this, proportion , 

were obtained for repeated samples of size N.= 30 for each ol^'the above 

-Tprocedures. The estimates were then compared for ^pcuracy to jthe value of * 



this proportiorif in. the population 6f_N. = 1586 «jubjecjts from \Aith-the samples 
were dravm. Both test length and mastery criterion werfe varied. .IVhile reason- C 
ably accurate estimates were;..gerierally obtained for all four proaedures, in- 
stances of systematic estimation bias were. observed, - ' ^ 

*' ... • . ■ ^\ ,■• . 

Subkoviak, M. 'J* Further comments on reliability for mastery te^ts. Manuscript 
submitted for publication, 1977/ ■ " . 

Abstract ^ . ' ; 
This paper illustrates that the various coefficients .of clas'sifiiration - 
consistency that have been proposed measures^of reliability for mastery \, 
tests have different interpretations arid statistical properties As -such, 
they should not be applied indiscriminateljV JRathe^ a user should employ ^ ^ ' 
that coefficient that is mqst, meaningful within the context of a particular ,v . : , 



problem. . - , ' V'^ ' ' ' ' - * ^ 

^ ' ^ • r - ^ 

" ■ - *. ' ' ' ^ - * ' . . 

• Subl^oviak, M. J., ^ Wilcox, R. Esti^nating the probability of* correct cjLassi- 
fication in mastery ^testing. Manuscrip/t submitted for publication; 1977 

< • ^ • Ai>stract J , • ' / 

Arprocedune is proposed for estiinating the proportion of. persons, dn a 
group' that are correctly classified^'^OTrr^ster^ test, i.e.; the proportion * 
who^e 'observed classification agrees with' ^heir'true. classification,^ A.num- 
erical example is provided, and extensions of the procedure are discussed. \ 

■ .... ■ i 

-Hubert, L. J.* § Subkovialj, M. J. Confirmatory inference and. geojne trie 'models., 
fehuscript submitted for publication, 1977. ^ ' ' . ' 

^ Abstract ( .y ' ' . 

A confirmatory method is discussed for comparin^l^outside'' variable to 
a 'given geafne*tric model, orjaltematively, to the* raw data 'from which the modql 
is derived. The inference prqcedure i$ based on relatively simple nonpatametric 



principles and reqCjiVes ^the comparison o£. a proximity matrix generated from a 
•g|sometric representation gainst a second "structufe"'^Tnati:ix obtained from the 
outside variable under study. A number of examples are presented that illus- 
trate-how the ^ame statistical approach can be applied in evaluating geometric 
models thkt aris6 in'^a number of ^ys,. for instance, those produce^ by some • 
explieit data reduction process, or possibly, models generated by naturally 
occuring spatial conti^ity. • • • ^ 




A. * 

^ A 



ReferenQes 



Algina; J.^ ^ Nqe, M.J,. An investp-gation of Subkoviak's singJe-administration 

' ! J\ ^' 

reliability estimate for criterion-referenced .tests. Paper presented at 

^tlie anpuai meetinjg of the American Education^ Rese^h Association, New 



York City, 1977.-^:/ 

Ilambleton, R.K., Q Novick, M.R. Toward an integration of theory and method 

for criterion- referenced testis. Journal of Educational Measuranent , 1973, 
.11,159.-170.. ^, '■- l-'-y^ ' \'' ' ' " ,-• 

iiuynJi, II. On tlie reliability of decisions m domain-referenced testing. 

' 'toumal of Educatiqnal Mgasiarement , 1976, 13', 253-264. ' > , ^ 

iluynli, II.* l)bliabilii:y of cr|^terion-.referenced tests: Comments oU a paper 

' ^by Subkoviak . / Unpublished m^flfrscript. University of South Carolina, Igj^. 
La |/alle, I.H. An/Jntrbdugj;^onl:o Probability, Decision!,- an^ Inference . I 

New York: Holt, Rinehart and Winston, 1970. ^ " * * * 

Marshall, J.L., Haerte^, ET^l. The mean split-half coefficient of agree- ^ 
ment: A single administration index of reliability -for mastery tests. 
Unpublish^ manuscript. University of Wisconsin, 1976. 
Subkoviak, M. J. Bs timatinq reliability from a single administration of a^ 
criterion-referenced test. Journal of , Ediic^ional Measurjemeht , 1976, 
13, 265-276.. ^ ' ' ' J'' ? ' 

Subkoviak, /M.J. Further comments on reliability for mastery tests (Labora- 
tory of Experimental Design, Occasional- Paper No. .17). Unpublished 
manuscript. University of Wisconsin, 1977. ' ^ ' 

Swaminathon, H. , Hamblpton, R.K. , § Algina, J.^ ReliaMlity of criterion- 
referenced tests: A decision-theoretic formulation. . Journal of Educa- 
tional Measurement, 1974, 11, 263-267. - ' *. » \\ 



21 

Swaminathon,. H. , Hdmbleton, R.K., § Algina, J. A-Bayesian decision- theoretic ' 
procedure for use^with criterion-referenced tests. Journal ef Ediicationar 
' ' Measurement" , 1975, 12, 87.-98. 



\ 



J. 



A J- 



4 



V 



ERIC 



25' 



Table 2a' 

Means and-Standard Errors of Swaminathon- 
Hambletori-Algina Estimates for Repeated 
* ' Samples of 30 Persons 



■ Mastery 


Statistical 


Test Length (n) 


Criterion (c) 

— ^ 7^ 


Index 


10 ' 


30 : 


So 














^""^arameter 


.67 


.79 


.83 




Mean 


.'68^ 


.79* 


.84 


^ 1 


St. Error 


. .08 
y 


,.07 • 


.06 


1 
\ 


*> 

Parameter' 


. .72 


.84 ^ . 

t 


.87 • 




Mean 


.72 


4 


.87 




St; Error 


.07 


' .06, 


.06 




Parameter 


.80" 


.88 • 


.91 

/ 


• ' ' 70% 


Mean ^ ^ ^ 


.79- 


.88 


.91 




St. Error 


.08" 


4 

S» .06 


.05 ■• 




Parameter^ 


-.88 


.94 


. .96' ;■ 


' m ■ 


Mean 


.87 


.93 


.96 : 




St.^rror- . 


■ .06 


,•05 


.08 



• • s ■ 



24 



k 



Table 2b 

Meansf^d Standard Errors of .Swaminathon- 
^ /Hambleton-Algina Estimates for Repeated 
\' Samples of 500 Persons * 



Mastery 




Statistisal 


Test /Length (n) 


Criteripn (c) " 


- 


Jndex 


■ 10 


30. 


50 






Parameter 


.67 


..79- 


.83 


501 




Mean 


.67 


.79 


.83 


- 




St. Error 


.02 


.02 ♦ 


' .02 






Parameter 


..72 . 


.84 


.87 


60% 




Mean 


.72 


;84 


.87 


1 




St. Error 


!o2-- 


.02. 


.02 






Parameter 


.80. 


.88 


• .91 


70% ~ ■ 




Nfean 


.80 


.88 


.91 " 

* 






St. Error, , 


.oz ■ 


.02 


.01 . 

> 






Parameter • ''''' 


.88 . 

V 


,.94; . 


[ .96,,r 


80% 

> 




Mean ^ 
St. Error 


■.88 
.02 


.94 , 
.01- 


.96 
.Olr , 



28 



Table -33 



•J 



'X 



Heahs'ant Standard Errors of Marshall- 
Haertel Estimates for Repeated 
Sanples of 30 Persons 









^ Test Length (n) 


Criterion (c) 


Index 




10 


30 


50 " 


4 ' 


* Parameter 




• u / 


• 79 ' 


83 








' .74 


.82^ ■ 


.84 




St. Error 

• 




.08;" 


.04 


.03 




Parameter 




72 


■ 84 

* • C 


87 


60% 

\J\J 0 






,;.75 


.84 .. 


.87 




* 

St. Error 




.05' 


.03 


.05 


— ' ' 


Parameter 


V 


.80 


.88 


.91 


70% 


Mean 




.79 


.'88 


.91 




St'. Error 




.03 


.03 . 


.03. 


t 


Parameter 




.'88^ 


' .94 ' 


.96 


80% 


Mean 




.85 


. .93 


\96' 




, St. Error 




•Of , 


,03 


' '.02 ' 



Table 3b 

Me<ms and Standard Errors o£ Marshall- 
Haertel Estimates for Repeated - 
Samples of 300" Persons 




Masteiy 


Statistical 


Test Length (n) 


Criterion (c) 


^ . Index- 


'10: ' 


30 . 


50 




Parameter 


.67 


.79 


.83 




Mean " 


.73 


.81' ■ 


.83 




St. 'ferror" 


.06 


.02' 


.01 




Parameter • 


.72 , 


.84 


>.87 ' 


601 


Mean 


,74 ' ' 


.84 


_.87 




•St. Er^or 


.03 

• 


.01 • 


-.01 - 


* 


Parameter 


.80 


.88 


.91 




Mean 


.79 


.89 


..92 




St. -Error 


.01 • 


.'01 


' .01. 


f • 


♦ 

Parameter 


.88 


.94. 


.96 


,80% 


Mean' .- 


.85 


.94. 


.96 




St. Error • 


.03 


.01 V 


.01 



27 



.1 



* 'Tabl^ 4a 



\^ Means and Standard Errors of Subkoviak 
Estimates for Repeated Samples * 
of 30 -Persons 



r 





Mastery 

Criterion fc)- ^ 
• 


" Statistical * 


' Test Length (h) " 


✓ 


^ex 


10 


^0 


' 50 


\ 


- 


Parameter 


' ■ 

.67 ^ 


4.79 


.83 






M^an 


,66 


.81 


.84 • 

-.J 


t '* . . 

" \ 




St. Erroi^ 


,06 


.04 


.03 


















Parameter 


.72 


.84 


,87 






Mean 


.69 


.84 


.88 






• St. Error 4 


.06 


.04 


.03 






Parameter 




.88" 


f 

.91 ' 




70% 


, Mean ^ 


' .79 


.89, 


.95 


t 




St. Error 


.05 


.04 ■( 


.03 • 






Parameter 


.88 


-.94'^^ 


.96 




* 80-5 


Mean o 


.90* 


.95 


.97 




( 


St. Error 


.05 


.03 


.02.0 . 






> 









ERIC 



31 



. Table ,4b 



•its, 



Means atid Stahdard Errors of Subkoviak 
Estiniates for Repeated tSan^les 
of .300 Persons 



f'iastery 
Criterion (c) 



50? 



60^ 




Csoi^;' 



Statistical^ 

Index-;- , 



Parameter 
Mean 

St, Error 



J^arameter 



Mean^" ^'^^ 
'St. Error 



Par^neter 
; Mean \ 
i St* Error 



Parameter 
Mean- ^ 



Test Length (n) 



•10 



.64 

\04 



.72 
.66 
.06 



.80 
.77 
.03 



.88 
.90 
.03 



30 



.79 

..79 , 
.01 



.84 

0 

.83 

.02' . 



.88 
.90 
.02 



.94 
.96 
.02' 



•50 



.83 

.a3' 

.01 



.87 
.87 
.01 



.91 
.93 
".01 



.96 
.01 



- 1 



*4 



Table 5a 

r ^ 

Means and Standard Errors of Huynh ; 
; Estimates for Repeated Sample's * * 



of 30 Persons 



Mastery ^ 
Criterion (c) j% 



Statistical 
L Index 



Test Length (n) 



10 



30 



50 



29 



501 



Pai^^ter . 
St. Eri^r^"^- 



.67 



.66 



.06 



.79 

').80 
.03* 



Pari 



fer 



60% 



:72 
.6'7 
.06 



.84 
.82 
.03 



83 
83 
02 



.87- 

.86 

.02 



•t 

70% ■ 



Parameter 
Mean ■ 
St. Brror 



.80 
.76 
.06 



.88 
.88 
.03 



.91 ' 

.91 , 

. 02A»i^^. 




Parameter 
Mean 

St^ Error 



.88 



.86 
.05 



.94 
.94 
.02 



,96 

.96'"/ 
.02 



Table 5b 

Means and Standard Errors of Huynh' 
Estimates for Repeated Sanfples , 
, of 300 Persons " 



Mastery 
Criterion (c) 



Statistical 
Inde'x" 



Test Length (n) 



10 



30 



50 



50? 



Parameter 
Mean 

St. Error 



.67 
.65 
.OS 



.79 
.79 
.01 



.83 
.82 
.01 



60'^ 



80'^ 



- Parameter 
Mean 

St. Error 



.72, 

.66 

.'06 



Parameter 
Mean 

St. Error 



.80 ■ 
.74 ,. 
.06 



■Parameter - 
Mean 
St. Error 



.88 
.86 
.02 



.84 

« 

.81 

.03 



.88 
.8^ 
'.01 



.94. 

.94 

.01 



.87 

.85. 

.01 



.91 
.91 
.01 



.96 
.97 
.01 



1 




3g - 



/Enpirical Investigation of Procedures for ^stiinating' 
Reliability for -Mastery Tests 



- V 



Michael J. -Subkoviak 



/ • University of Wistonsin 



I 



4^ 

« 



. Running head: ReliaB'^ity for Mastery,Tests 



1 



^1 



EMC . • • J ' _ V . - ^- 



^ Reliability 
33' 



^ Abstract ' > \ 

\^ \ Four different, procedures (Huynh> 1976; Marshall Haertel,^1976; ^ 
Subkoviak, 1976; Swaminathan, Haii±)leton 5 Algina, 1974) have been proposed 
for estimating the pipportion of persons consistently classified as master/ . 
master -or nonmaster/npnmaster on two mastery tests. Estimates of this pro- 
portion were obtained for repeated sampl%s, of size = 30 for each of the above 
procedures. The. estimates were "then compared for accuracy to the value of . 
this proportion in the population of N = 1586 subjects from which ^e sanq^les 
were drawrT. Both test length and mastery criterion were varied. While' rea~ 
sonably accural^e estimate's were^generally obtained for all four pro^cedures, * 
instances of systematic estimation bias were observed. 



f c 



ERIC 



0 



37 



<— Reliability 
34 



Errpirical Investigation of Procedures fox Estimating 
Reliability for .Mastery Tests' 



tdr present piirposes, a mastery test can be loosely defined as a test 
vdtif^sijpgle cutting score, £, th^t determines mastery .and nonmasterf clas- - 
ses '-- scores above and below c respectively". Jn this context, reliability 
refers to the consistency of mastery-nonmastery decisions over repeated test*^"^ 



umstrations (Hambleton § Novick, 1973, pp. 166-167). Accordingly, the . ' 
proportion of consistent mastery/mastery and ^nmastery/nonmastery classifi-' 
cfiJtions on tro tests with cutting score c, symbolized P^, has been proposed 
as a raw index of reliability in this context (Swaminathan, f^[airfc § Algina, 
1974, 1975). In addition, jSiree procedures for estimating proportibn fr6m 
scores on a single test have emerged (Huynh, 1976: Marshall § Haertel, 1976; 
C^bkoviak, 1976), Thus, a-teacher ortpther test user is faced with the prob- ^ 
lem of choosing anong four- different procedures for estimating P in the ab- 
sence of any cleir guidelines . The purpose of this brief note is to report 
/ tjle results of a. simple ^empirical exeinpise" in i^ich population valued of P^ 
were zoxap^x^ to sample estimates of P . Specifically, the proportion of con- 
sistent classiffications on two tests for a pQpiilaJtion of 1586, a P~ - parameter, 
,was cQ^ipared for accuracy to four different P^ '- estimates (Huynh, :p76; jj 
Marshall § Haertel,. 1976; Subkoviak, 1976;^ Swaminathan et al., 1974) for^^^i*^ 
repeated samples of^O from the same'po^atToa. The Results thus illustrate 

the extent and nature of discrepancies between parameter and estimates. The 

«» * * 

, results also* have indirect relevance for the-process of estimating coefficient 

, kappa, a'function^of P^ that has aisg been proposeci as an index of reliability 

for mastery tests (see Huynh, 1976; Subkoviak, 1^77; Swajninathan et .al.v^r^74)>r 



Reliability 

/ - " 

^ - , Method and Results ^/ . 

T^e data base consisted of the responses^ of 1586 students to parallel ' 
forms of la, 30,, and 50 items each from the Scholastic Aptituie 'Test. (The 
10-item forms- were included as part of •the 30 item forms \diich in turn were. 
pa.rt of the 50 item forms.) The means, standard deviations, and KR20 relia^ 
bilities of the varioiis forms Ve shown in Table 1. Half of the items on each^ 
form were, reading cdhprehension; and the^other h?ilf were a mixture of analogy, 
antonym, and sentence completion i terns \ " - - .V ' ^ ' 



Insert Table 1 here 



Thus, tfte distribution of scores for. 1586 students on parallel tests'- of 

n = 10, 30, aii?d 50 items were available. Four different mastery criteria were 

\ \ * ' 

considered for each n-item test: £ = 50%, 60%, 70%, and 80% .correct. . For 

eacji combination of ,the threei test leijgfhs (n) and the four mastery criteria , 

(c), the propdrtipn ,(.^) ^of thfe 1586 students consistently classified as \ 

master/master, or nonmaster/nonmaster (on parallel forms of -length n) was com- 

,puted, for a total of 12 values of parameter P These values' appear in eac^i 

cell of Tables 2-5. For example, vrtien n, = 10 and the^ mastery criterion 

1 . * 

is set at c = 5t)% (or 5 items) correct in Tables 2;-5> 67% of the 1586 students 
were consistently class<iffied on .two 10-item parallel forms, i.e.,,P^ = .67. 
The other nunfcers in each- cell of Tables 2-5' are the mean and the 
standard error of ^50 estimates of parameter P based on 50 randWrwsanqjles of 
30 students frbm the same population of 1586. Fbr exajmgle, when n = 10 and j * 
c = 50% in Table 2,' the mean of the -50 estimates is .6^; and their standard 
error is .08, i.e., the estimates tend to deviate from the parameter. valufe (.67) 



by' .08 ^^ts on the average 



'1 



-Reliability 



•J 



J 



oSwaniinit:^an-Hanibleforr--Al'gina Procedure • < - ' 

Jn Table 2,, the fact that cell mjeans generally^ equal corresponding para- 

^mi^T values (and the two do ndt 4evl4te in any obvious systematic, fashion) 

^suggests that^Swaminathan et al. eg^ates are unbiased, as mi be expected ' 

/for tills, two test procedure. However, as^rill become apparent, the standard 

.errors Table 2 tenjl to be somevdiat larger than those of Tables 3-5,* suglr ^ 

, ' \ ^ 

gesting that Swaminathaiii et al. estimates fend to fluctuate about the parameter 
value to a 'greater extent than Marshal]|^Haerte^, Huynh,^or Siibkoviak est^imates. 
* Of course^^ the standard error 'could easjily be rfeduced by increasing the sample 
size from N = 30 to, say, N = 100/ 



It mght also be noted in Table 2 



Insert Table 2 here 



that ftie standard erfor generally tends to 



decrease ^s test length (n) increases-'and as the mastery criterion (a), in-* 

. creases • [These observation? are attri|butable, at least >in part, to the fact ^ ' ' 

that as the parametric value of a proportion (P^) becomes more extreme (large 

^or\small), estijiiates of that proportibn tend tq, be more accurate (less variable)/ 

/ *^ ^ , ^ ' ' , ^ 

These sape trends are repeated in subsequent Tables 3-5. It ^uld ,also be 

mentioned at. this point that the reliability* estim^tes^ of Tables 3-5 are based;' • 

on one administration' -of t^st Form 1-^to sanples of 30 students. Estimates* 

based on parallel Form 2 of each t^st are very similar and are not reported 

here. ^ % 

Marshall -Haertel Procedure. 

— : - • y 

This procedure assumes that /distribution of* observed scores over repeated 
- testing of a fixed, individual ^dent is binonu,al iii Sr^^ Students' observed* 
proportion correct scores on an l actujil n-item test are used as approjcimations to 
the binomial p - parameter, and the group distribution of observed scores on a ', 



O 



ERIC 



40 



* ^ Reliability 



57 

% 



hypothetical 2n-item test is Emulated/ This Zn-item test is split iiito half- 
tests in all possible ways,' and an estiinateH)£ is computed for 6ach split. * 
The mean of these various split-half estimates is then taken as' the final es- 
timate of P^. See Marsh^^ and Haertel (1976) for further details. 
Table 3 contrasts Marshall -Haertel means with p^ajneter values. 



Ins^t Table 3 here 



5^t 



There appears t6 be a slight, systematic bias in the estimates (nveans) of 
Table 3. The estimates in the top two rows, corresponding ^to mastery 'criteria 
near -the mean of tiie test score distribution, tend to overestimate the parameter^ 

, ^ \ m , ' 

especially for short tests. Conversely; estijiates in the bottom two rows, cor- 

responding to masfery criteria in the tails of the distributidn, tend to sligjitly 

underestimate the parlameter. Mgina aixi Noe (1977, p., ^) report'^tlie same type 

'of bias in a somevdiat different context and relate it to the^use of students^ 

observed proportion correct scores as approximations* of "the binomial p - 

par.a3Tieter, The magnitude of such bias should decrease as test length (n) 

increases (as in^Table 3).; since observed proportions provide better approxi- ^ 

mations to the. binomial p - parameter as the nunoiber of items (trials) increases . 

Subkoviak Procedure ^ \ ' 

This procedure assumes that the distribution- of observed scores ov^r 

repeated testing .is conpound binomial and that test outcomes are independent 

for 'a fixed indiVidual\ Observed proportion correct scores on* a test and the 

associated KR-20 coefficient are used to obtain linear regression approximations 

^tol^t^ Compound bincMnial p - parameter. A groiq> estimate .of P the proportion 

of consistent classifiqations on two tests, can then be obtained from the in- 

- ■ . ■ ' . ' . i 

dividual compound binomial jiistributions. See Subkoviak fl976) for^details, 

^ / 

Table 4 contrasts Subkoviak means >&ith paranreter values. Algina and Noe 



V 



■^r^ *' Reliability 
• 38 



Insert Table 4 here 



(1977, p. "6) report sli^t, systematic bia? in Siibkoviak estimates (leased on 
linear regressi'oh approximations of ^inomiaL parameter p) for simulated data > 
Specifically, they found estimates of P to be too smajl for mastery criteria 
(£) near tlie test score distribution mean and too large for mastery criteria 
in *the tails of th^ distribution.. Such a. trehd^ is .not obvious in Table aJ. 
If., ii^eed, there is a trend, it maybe toward sli^t underestimation for 
short tests 10) and slight dverestimation for^dong tests (n = SO) ; but 
the^vidence is^ not totally convening. 

While the Subkoviak procedure produces reasonably accurate P -estimates 

* ' ' c 

in this case, v^ evi^^nced by the small standard errors, grossly inaccurate 

" >• • ^ 

estimates can.qccur if linear /egression approximations of binomial parameter 

p are blii^ly obtaineid "foF multimodal data sets: (see Huynh, 1977, Counterex- 

anple 2; Subkoviak, 1976, p. 269). While more coiiplex regression techniques 

could employed to approximate p in such cases , the procedure .discussed 

next provides ja-more tractable solution for most data sets likely to arise 

^n praet^e, e.g., unimodal, .uniform, or bimodal, 

Huyrih. Procedure ^ ^ ' 

Basically, t^is pirocedure assumes that the distribution of observed 

scores over repeated testing is binomial and that test outcomes are indepeu- 

dent for a fixed individual. In addition, the associated distribution of in- 

dividuals' p - parameters is assumed to be beta in form (see LaV^e( 1970, 

p.^ 256'fot* exaiples)! Under these assumptions, the bivariate distribution ^ 

of scores on two testings for the group is beta-binomial in form andx^ be - 

approximated front-scores oh* a single test administration. Estimates of P 

— c 

can then be ol^tained from the simulated~beta-binomial distribution. See 



ERIC ' . ' \ . 42 : 



•Reliability 
39 



Huynh ^1976) 'for further explication. , ' * 

• — J 

Insert Table ,5 here ^ 

Huynh meaiis and parameter Values are contrasted in fabj.e.5. The trend i 
in Table 5 (as in Table '4) would appear to be toward conservative estimation 
for short tests. However, the estimates are generally quite good as evidenced 
^by the small standard errors. 

Conclusion^ ' * * 

All four procedures (Huyrfi, 1976^ Nfarshall ^ Haertel^I976 ; Subkoviak, 
1976; Swaminathan et al., 1974) dppear to provide reasonably accurate estimates 
'^f L> the prd^rtion of cbnsistent classifications on Wolmstery tests, for 
the variousycases^considered. In particular, the'*Huyilh procedure seems es- ^ ^ 
pecially tractable. Table 6 shows ^the means standard errors of Huynh 
estimates based on 50 sanples of . 30O persons from the same population of 1586. 
Test publishers that employ large pilot sajmples might expect results like ' 
these. ' ^ ~ ' . 



Insert Table 6 here 



While it might seem inappropriate to en^Jloy SAT data iji the present. study 
rather than mastery test data, this would not appear to be a serious .limitaf 
tion. Statistically speaking, the key issue is the performance of each esti- 
mation procecJure aS the parametric value of proportion ranges between .50 
and 1.00, regardless of the data base. In fact, it is interesting to note 

\ L 

that the Marshal 1-Haertel, Subkoviak, and Huynh procedurjss, which basically 
.assume item homogeneity* produced' accurate estimates, for the heterogeneous SAT 
items en^loyed herein*. ^ : " " 



Tteferences 

Algina, J.-, f, Itoe, M. J. fi ^^ inve s.tigation jbf-Subkov^ak ' s single -administration 

reliability estimate for crTbi^oiw-eferenced tests. • Paper presented at 
. " the annual meeting, of the American B4ucationai Research Association, New- 
York City, 1977. ' ' , , ' . 

Hambleton,^. K.^, f, No'vicit, M. R. ^Toward an integration ijf theory and method ; 
for criterion-referenced tests. Journal of Educational Measurement , 1973, 
10, 159-170.. \ • . . 

Huynh, H. On the reliability of decisions in domain- referenced testing. 

' Journal of Educational Measurement , 1976, 13, 253-2^4. 
Huynh,, H. 'Reliability of criteriop-referenced tests: Conments on a paper 

' by<-Subkoviak. Unpublished, manuscript, Uniyersity of South Carolina,; 1977. 
La Valle, I. H. An Inti;oduction to 'Probability, Decision, and Inference . 

.New York: Holt, Rinehart and.Wihston, 1970.;. » 
Marhsall, J. L. , § Haertel, E. H. -The mean split-half coefficient of agree- 

ment:- - A sihgle administratio^i index of reliability for mastery tests.]' 

Unpublished maniascript. University of Wisconsin, 1976. 
Subkoviak, 'M, J. ^Estimating, reliability from a single administration of a 
' criterion-referenced test. Journal of .Educational 'lifeasur^anent , 1976-,' ^ 

13, 265-2Z6. . • 

SuHkoHak, M. J^Further comments on-reliability.for mastery 'tests (Labora- 

tory of Experimental Design, Occasional Paper No. a77. Unpublished 

manuscript. University of Wisconsin, 1977. 

Swaminathon, H., Hdmbleton, R. K. , Algina, J. Reliability of 6riterion- 

referenced tests: • A decision- theoretic formulation. Journal of Educa- 
' . tionaJ hfca^urement , 1974, 11,1263-267. v 



41 



SwaMnathon, H..',^ieto„, R. K. , 5 AI,i„a.. A Ba.esia„ deci^ion-theoretic 
procedure for use with criterion-referenced t«sts. iourna^o£Ea,cational 



Measurement . 197S, 12, 87-98. 



■u. 



r 



V 



I 



\ ■ 



. 4» 



•^Fbotnotes . 



• • This research was made possible by Grant No. NIE-G-76-0088 from the 
National. Institute of Education, 

■ ■ ' ' . / 

-i5ary Marco of the College Examination Entrance Board and Educational 
Testing Service provided the data used in this stwly, ^ile Barbara Albrecht * 
aiyi Carl Voelz helped with the analyses. Thejfesistance oi each is. gratefully 
acknowledged. * 



» s 



« » 



I . 



ERIC 



46 







Table 1 
Test Statistics^ 






• 

* 

• 










; ■ 








J 

Statistic 


Test 


Test Length' 






Form' 


10 


•'30 


50 










■ f 








Mean 


1 • 


4.8r; 


14.49 


24.11 








2 


• 4!e7 


15.18 


25.05 . 




/ 


Standard 
Deviation 


2 


2.00 
' .2. .07 


5.45 
4-.87 


8.43 
7.83 






-^^f "t— 

KR20 ^ 


1 


.55 • 


.81 


, .87 


* 9 




Reliability 


- ' 2 


.56 


.77 


.86 •: 

1 . . 





■ ^ = 158B" 




ERIC 



47 



Table 

. Means and Standard Errop of Swaminathon- 
Hambleton- Algina Estimates^ * 



' ^tery 

. Criterion '(cr) 


Statistical* 


• Test Length (n) 


Index 


10^ 


30 ■ 


. 50 _ 


■ - 


Parameter 


.67. 


<79 


.'83 




Mean 


.68 


■ .79 


.'84 




St. Error- 


.08 


.07 


.06 • 




Parameter 


.72 


V.84 


.87 


60% 

•> 


Mean , 
St. Error - 


.72 
.07 


.83 


.87 

.06 ' 




Paranveter 4 


.80 


.•88 


.91 


' 70% ' 


Mean 


.79 


. .88 


- .91 




•St. Error 


.08- 


..06 ■ 


.05 




Parameter 


.88 


• .94 


.96 


80% 


Mean 


' :87 


.93 ' 


.96 




St. Error 


.06" 


, .05 • 


.08 



^Means and standard errors are based on 50 saiiples of 3() persqns 



- ^ 



• - .'-48 




• 45 



Table 3 • • . : 
Mean^ and Standard Errors p£ Marshall - 
Haertel Estimates^ ~" 



Criterion (£) 


o tatistical 
Index 


' Test Length (n) 


10 . 


30 


50 


50% 


Parameter 
St. Error 


.67 
.74 
.,08, 


V. -79' 
%82 
.04 


.83^ 

.84 

.03 


.6.0% ■ . 


Parameter 
^fean 
, ^t. Error 


; .72., 
.75 • 

'.05\' 


J.84 
.84. 
.03 


.•87 
.87 
.03 


70% 


Parameter 
Mean 

St. Error 


- -SO \ 

• .79 '• 

• .03 


.88 
.88 
-.03 


.91 
.91' . 
.03 




. Parameter • 
Mean i ^ 
St. Error 


.88 
.85 ' 
.04 


.94 
• .93 
.03 


.96 • 

.96 

'.02 



^ans and standard errors are based on SO sauries' of 30 persons. 



ERIC 



Table 4 - - 

Means and Standard Errors of Sxibkoviak Estimates^^ 



NaS ZGXy 


o La. Ll 5 L IC a 1 




Tes 


^Sil^g^th (n) '/r-^ 


Criterion (c) 


Index 




' "10 X 


30 ' 


^ 50 




Paraineter 


* 


.67, ■ 


.79 


. .83- 


«JV/ 0 






66 


.81 


, .84 




St. Error 




.06 ■ 


*.04 


^03. 




Parameter 






.84 


.87. < 




Mean 




.69 


.84 


' .68' 




St. Error 




.06 ' 


,04. 


" .03 




■ ■ 

Parameter 




.80 


.88 ' 


.91 


. * 70%- % / 


Mean - 




.79 


.89 


93' 




' St. Eigror \ 




\.05. - 


■ .04 ' 


j.03 














' - 8o«),.,:;i : . 


"^'ParameXer 
/..Mean 




.88 . 
.90 


■ .94 
.95 


\ .96' 
V97' 




St* Error 

^ ; J" 




J -OS;' 


.03' 


.02 



^eans and standard errors are, Hastid on>'S0^"s&iT?)les of 30 persons . 



47 



<iBS 



Table S-- 



Means \nd Standard jSrrors of Huynh fis^iinates^ 



Mastery. 
Criterion (c) 


Statistical ^ 


l- Tes 


llfigth (nX. 


Index 


10 


30 


. ■50-' 




'Parameter ^ 


.67 ^ 


T 

,?;'.79 


.83 


50% 


Mean . 


.66 


liso. 


83 




St . ^Error . 


.06 




. - .02 




Parameter 


.72 


.84 


.87, 


60%' 


Mean 


.67 


.82 


.86 




. St. Error 


.06 - 


.03 


.02'' 


9 


Parameter " " 


' '..80 


.88 , 


.91 




Mean 


.76 


-.88 


.91 • 




St. Error 


• .06 . 


.03 


.02 




Parameter 


.88 




.96 ■ 


80% 


Mean- • 


.86 


.94 


.96 


t 

1 1 d 


St. Errdr^ 


.05 


.02 . 


'.02 



^"ans and standard ^Jrofs are based on Sft^anples of 30 persons. 



Si 



ERIC 



51 



r 



48 




r 



Table 6 



Means and Standard Errors of- Huynh Es'tijnates 



a 



^Mastejy 
Crirterion (c) 


x^atistical 
Index 1 


Test Length (n) 


10 


30 •. 


50 


50% 

^ * 


Parameter 
Mean 

St. 'Error .-t^a 


'.67 
.65 
■ .03 


.79 
.01 


.83 

.82 ■ 
.01 


60% 


Parameter 
MeaijX 
St. fError 


* .72^' 
.66 
.06 


.84 

.81 ~ 
.03 


.87 
-.85 
.01 


70% " 

■ ^ y 


Parameter 
Mean" 
*St. Error , 


.80 
.74 

,06 


.58' 
.88 . 
.01 


.9^1 
:9l. 
■ .01 


80.1 * 


c 

Parameter 
Mean 

St.' Error 


.88 
.86 
.02 


-.94 
, .94 
^.01 


.96 
.97 

.01~'j- 



^ans and standard errors are ba^ed on^ 50 s^les of 300* persons^ 



Estimating the Probability o£ Cprrect 
Classification in Mastery Testing 




.Michael J. Subkoviak 



University of Wisconsin' 

Rand Wilc(3x 
University of California *a,t Los Angeles 



'J 



ft 



Vr' 




Running head: Probability" of Co'rrect ClasiSfication in Nfastery Testing 




■53' 



Estimating the Probability 



Abstract 



A procedure is proposed for estiteting the proportion of persons in a • • 
group that are. cGrtectly^(jlassified on a mastery test, i.e., the proportion • . - 
\ihose observed; classification agrees .with ■theii* 'true classification. . " 
numerical' exaji^^i: 



,is provided, and extensions of the procedure are -discussed. ' 




r 



A 



«3 



ERIC 



54 



' ^ ■ Estimating the Probability 



Estimating "tiie Probability of Correct 
Classification in Mastery Testing 



JCeats and Lord (1962)^ have proposed a. relatively simple mathematical . 
. model for test scores that has ^ -number of pracfejai applications'. For in-^'' 
.stance, Huynh (1976) and^ Subkoviak and Albrecht ^4-977) have demonstrated 
empirically that tTie iiiocgl can be us'ed to estimate, the degree to which per- 
sons are consisten ^ cl^ified as masters or nonmasters on parallel mastery 
tests, v^ere a pass-fail test; with a cut-off score of 751 correct is one ex- 
ample of a mastery test . ' < ' 

Th-e purpo5e/6f the present, paper is to ll-lustrate that the Xeats-Lord " 
model is dlso useful fo^^^^estimating the extent to vhi-ch persons are correct-'- 
classified, i.e., fpr^estimating the proportion of persons whose classifi- 
'"catiori based on obsewed score, agrees with their classification ba?ed on true 
score. In additiony'^exteBsions-of the proc^Ure to the case of polychotomous 
1,^^^^^^^^^^^°"-^"^ farther geheraliz'ations o'f the Keat§-Lord model' are noted. 

' ' , ' 'The KeatSrLord Model' 

Let US-begin with the following no tational^ definitions: 
n = number of -test items; ' > ^ 

X ^ number of items correctly ^swered by 'an individual - ^ ^ * 
unknown^ t^ie proportion correct scor^ for an individual < i.e., the 
' moan of an individual's observed proportion correct scores (x/n) 
\ ^ over repeated parallel' tests; •> s \ 
■n = mastery cri'teri^n exprelsed^s a proportion, e.g.,"£ r,Tr implies 
true mastery and £ < ir implies true norimastery; 



* Estimating the Probability 
* 

;c = Vstery criterion expressed as the number o£ items correct on an n- 
- Ll - - - ^^ics observed mastery and x < c implies ob- 
• served- nonmastery;' c equals the smallest integer greater, tiian^r.' 
eqyal to nu. ' ; . - ' . • 

Of interest here is the extent to vAith persons' observed mastery- 
noimastery classifications (i .e. , x > c or x < .c) are correct' or. in other" 
•words, agree witii^their true mastery-nonmastery state^ (i.e., £ > Tr/or £ < tt) . 
One natural index of agreenient in. this sense is^the probability that the ob-' 
served classification- corresponds to the true classification for a typical 
examinee. TTiis probability will be symbolized Q, and thus ^ represents, the 
probability of correct classification for an examinee randomly selected from" 
the population of potential examinees . ' Mathematically Q- can be expressed as> - 



S = P(2S < £ , £ < Tt) % P(3C > C , £ > TT) , 



(1) 



J 



wher^.fCx < c, p < TT) and P(x > c, £ > TT) are respectively the probability of 
correct nonmastery and correctr mastery classifications. " ' 
■ .Under the assiiiptions of the; Keats .-Lord model (discussed below), it ban 
be shown (see thej^endix) that in Equation 1, the probability of a correct 
classificdtion, is given by:" < . ' 

' . c-1 ' ■ ' 

S = {l/B(My,rl),}{l r^)Ba+x-^l, mfn-x+1) L (A+x+l, Wn-x+1) +' 
J ni+n-x+lT[l:I^a+x^^ . " : (2) 

If the mean (u), vafiancp (q^), and Kuder -Richardson 21 reliability '.coefficient^ 
(p) of .observed x-scores are\estimated from" a reasonably latge sample of tes- 
tees; the various terms in Equation 2 cao be Waluated as: follows: 
J--:- .H ^"* nuiifcer^of test items; ' * , 

'/ • - mastety criterion, expressed as a proportioAj ' 



^ . Estimating the Probability 

* - . • • > . .' 53 

c = sifollest integer greater' than or equal to mr; 
... p = [n/(n-l)][l.u(n-u)/Cncr2)]; • ^ ; k 
£ = u(l/p-l)^l; - ••. 

m = (n-u)(l/p-l)-r; . • , \._ - ^ ^ ' 
(|) = n!/[(n-x)!x>]; ' 
B ( , ) = beta fun&tion which can be calculated by camroi^ computer routines 
or can be found in standard mathematical tables^,.. ■ T 
^ i^rC . ) = incomplete beta function' which can also be confuted by conmon 

\ computer routines or can be found in standard .mathematical tables."'-' 

The Keats -Lord model upon v^iich Equation 2 is- based involves two basic 

f 

assumptions about the nature 6f true and error scores. It should be noted in 
passing that the modil 'and its ^assumpticfns have been Ihown to. fit a variety' of ^ ' 
real data sets quite -adequately (Keats § Lord, -1962).: First, it is assumed" 
that tiie-distjpibution of true scores (£> for thfe 'population of examinees ' is * 
some, member of the beta family of distributions". This faMly includes distri- 
butions having the usual bell-shap6, as well as "rectaii^lar and U-shapes (see 
LaVallQ, 1970, p. 256 for itore examples). As such, the beta assumption is 
quite liberal ill that it accomodates a wide range of ^)dsiible true score'dis- . " 
tributions. ; . \^'''' 

Second, ^if n-item .tests were -repeatedly administlred to a single individ- ^ 
•ual<ith true stbi^£- it is assumed tha^t his or her c^iWibution of observed 

score?^'(x) would be binomial wi th- parameters n and £1 This assunption tends 

to follow^, f§r instance, if iLis are sj<^d 0 and l; .^f 'tii.e outcome ,on one / ' . ^. 

item does not affec.t the outcom^SQn^arfjiers ;\id if iteins are equally difficult. ^ 
^ile the lattef two conditions rarely, if ever, occur^ in practice , the qual-. ' ' 

ity of eiiipirical Results reported by Keilts and Lord (lj962) seems 'to 'indicate 
>that th*e model' is r^obust with respect to such violatiojis. • > 



- -* * Estimating the Probability 

- . \ ■ - ■ 

'<' ^ ■ \ . Exarple ; - • 

Suppose an n=4 item test was'- administered* to N=100 students; and suppose ' 
the test scores x= 0,l,2Urand 4 occurred vdth fiJequencies f(x) = 8 25 34 25 
and 8 respectively.' The sanple mean and variance are u = Zx£(x3?/N » 2.00 and 
■ ^ = 5:(x-a)2£(x)/(N-l) = 1.15; aii(J„the es'tiinate of KR2K&^ = [n/(n-l)] x - 

■ ^ [l-u(n-u)/ (nS2)] = .17". The quantities £ and m required by Equation 2 are equal 
in this particular instance, i.e. , |.= 0(1/3-1) -1 = 8.76 and m = (n-G) (l/p-iy-l 
, .;.= ,8.76; but this is i;^t true in general. Now if the mastery criterion is set • 
at, say, tt = .85, then c ==-4 in Equation 2; since '4 is the smallest integer 
greater than or equal to mr = 3.40. * ' ' 

' -Sumnarizing, the parameters required by Equation 2 are: n=4,' £=m=8.76, 
:. 7t=.85 and c=4. 'Thus by Equation 2 the probability! of a correct classification 
is: ' *. . . • . , > -t^ 

J 



S= fl/B(9.76; 9.76)}{.l /^\b(9.76+x, 13.76-x)I oUgt'Zfi+x, 13.76-x) + ^ ' \ 
. .-4 . • ~ 1v,'. . 

I yfi(9.76+x, 13.76-x) [1-1.^(9. 76+x, 13.76rxh} ' ' ., ' 

Where Q), B( , ) and I g^C , ) can be obtained vi^^eftliJute'^Por standard mathe- 
matical tables (e.g., Pearson, 1956) and are as shibwn in Ta& 1.' Substituting 



i 

9 \ 



\ ' Insert Table 1 about here ; P 

- — ■ 1. ■ i . y 



^ these values inte the above equation gives the foybwin^ result: 

2c ^Vt.l.5xl0^S)}{(l)(;l2xl0-^)(.99)+(4)(.95xl0-7j(.99)V(6)(.87xl0-7)(.99)+ 
. • " (4K.95xl0-7)(^99) + (l) (.12x10-6) Q.. 99)} 



1 ' 
' = .93 



ifius, 'it'is 'likely th?j>t 93 out of^the 100 students! test^ are corr^tlT cl&A 
s^ified. . / ^ . ? ' 



^ ' Estimating the Probability 

^ - , / • . • - ^ 55 • • 



Discussion 



° In-the example above, §=.93 when the m4stery_criterion is set atTr = .85. ' ' 
. However, larger pr smaller values, of ^ could be obtained for this same data 
set by^simply changing tt. In fact; the magnitude of ^ is affected by a number 
of such factors., Specifically, the probability of correct classifi4tion, q^, 
tends to increase: (a) as the density of true scores- about the mastery cri- ' 
, terion. decreases and (b) as the numbei/cof.test items increases. Thus, ^ would 
assume larger values for settings of the u-criterion in the tails of a bell- 
shaped distribution (as in the example above) than for settings in the middle ' 
of the distribution. Secondly, 'for a particular setting of u Wh as .85, ^ 
• would be larger for a 2n-item test than for an n-item test. 

. In regard to generalization, it is worth noting that the probability, of * 
correct classification can also be estimated for the case of three or more 
levels of mastery. For example, let tt^ and (where tt^^ < ttz) represent dif- 
ferent degrees of mastery on the true sc9re scale; and let c^ and £2 be the 
. correspondiijg criteria ^n th^ observed score scale. Three categories of mastery • 
%are thus possible, and Equation 1 ^anhe generalized as follows; 

2 = l(l<£i, £<^i) + P(C;l^<C2, Tr^<£<Tr2) + P(c2<x, .^2^) (3) 

• The argument outlined in the Appendijc can then be applied t^^qi4tion ^i£)= 
. obtain- an expression similar to "EqUatjon 2. * ^ - 

.Extensions of: the keats-Lord irodel , on which .Equation 2 is based, are alsd i ^ 
possible, ijor example,: Lord (1965) has suggested a slightly rore romplex model 
that^nvoE^es somewhat more general and reasonable assump^tions regarding true 
and error sc^re distributions (see also Lord, 1969) . HoweV^r, a?Monte Carlo 
study by Wil|;o3f (1977) ' seems tQ suggest ^that the more complex model adds little r " 
in the way of accuracy to probafiility estimates, which are of interest here. > 
Of course tjjis does not imply that the more complex model lacks utility »for 



•5 
I 



er|c - : i , . '59 



Estimating the Prob|^ility' 



certain other purposes, e.g., for simulating 6r estimating, complete bivariate 
distributions of true and observed scores • 



••a 



Estimating thei^robability 

Appendix *^ 

The beta assun^Dtion of the iCeats-Loi?d mo^iel in5)lies tha't a true score, £, 
can assume any value .within the interval 0 to 1; while the binomial assumption 
implies that an observeid score,'X, takes on only iiy:egral values between 0 and 
^ n. Thus, Equation 1 can be written as a double summation of P(x,£) values, 
ytere P(x;£) • represents the jojnt probability distribution p£ examinees' ob-. 

« 

served .and true scores : ' - 

q=l\/;p(x,E)dE] + ![/;P(x,£)c^], , ' (4) 

x=0 ■x=c 

where P(x,£) = [g) /B(ui,mfl)li>^''^(l-2.)12?-Il-2i (Keats § Lord, 1962, p. 69). 

The first integral in Equation 4 can be expressed as follows : 

/;P(x,£)d£= [(5) /B(Ul,n^li]/^2^"^(l-£)i5'2:"% 

■ = [BO+x+l ,nv^n-x+l)/B(£+^^ .nr^n-x+D.] [(|) /B^+l 

. - • = [B.(i^x4 '.Wn-x+,1 )(^) /B(£+l ,nvH) ]/^ [l/BCa+x+l^Jivfn^-x+l) Jp^*^!-^)^-^ . 

= [B(£+x+l,m+n-x+l)g^ /BC£^^ . (5),; 

"'Where (5^),"*B( , ') , and^i( , ) are respectively the binomial ^coe^icient, the 
beta -function,, and the inconrolete beta function.- ' ) • *- 

. Similarly the second integral in Equation 4 can be written:/ ' * 

//(2S»£).<iE = [(|)/B(2+l.m^l)]/;;£^'^l-£)j^'2--% " ■ . ' . ; 
[Ba+x+l,m+n-x-^l)g) /B()in,^^ 



' ^ • = [B(«,+x+l,m^u-x+l)(|) /B(«^^^ 

" = fi(l+2S+l,n!+n-x-fl)^|^VB(£+l,mfl) ^ (6) 

ERIC . • . 



i<)n 



Estimating the Probability 
, 58 



I'inally, liquatidn 2 is obtained by sjibsti tuting 5 and 6 into 4: 
r [Ba+x+1 ,irt<-n-x+l) /§\/BfHl ,nH-l) ]I (;i+x+l»mfn-x+l) + 

=0 . V±/ -Tf 



x=0 
n 

r 

X=C 



I [Ba+x+l ,m+n-x+l) /BU+i .jw-l^J [l-I /A+x+l ,nH-n-.x+l)] 



c-1 

{l/Ba+l,m+l)}Q g)Ba+2cVl,mfn-x+l)I^a+x+l,n^^^ + 



References 




PTjpbability 



Griffiths, D. A. Maximum likelihood estimation for the beta-binomial dis- 
' tribution and an aptplication to the household distribution of the total 
-number of cases of a disease. Biometrics , 1973, 2?, 637r648. 
Huynh, H.^ On consistency of decisions in "criterioi^- referenced testing. - ' 

Journal of Educational Measurement . 1976. 13, 253-264. 
Keats, J. A. , ^ Lord, F. M. A theoretical distribution for mental test scores. 

PsychoiTietrika . 1962; 27, 59-72. • y 

LaValle, I. H. An Introduction 'to Probability, Decision, and^ Inference . '\ ■ 

New York: Holt, Rinehart and Winston, 1970.- I ^ ' 

Lord, F. M. A^trong true score theory ,fiwith applications, psychometrika , 

■' ' 1965, 30, 239-270.- • I ^ • ' - • 



Lord, F.. M: Estimating true-score distributipns in psychological tes^^ng ■ 

' ' ' . ■ ^ i 

^ [An empirical Bayes estimation problem)-. PsychomQtpika . ! 1969 . 34 . 25^- 

.299. ■ ■ . '- ■ ' i \ 

•Pearson, K. (Ed.) Tables of the Incomplete Beta -Function. New Rochelle 

~ ' ■ — ■*? / ' . ' 



N.Y. : Cambridge University Press, 1956. 



. Shenton,.L. R.„ Nfaximum likelihood and tiie^ efficiency of ^th^Jjetiio/of 

- I ' ' . • - 1 . . . 

• moments. Biometrika , ^1950. 37, 111-116. ' ( i ' \ 

•Subkoviak,. M. J., § Albrecfit, fi. A.' Empirical investigation If procedures' 



for estimating reliability for mastery tests. Manuscript submitted for- 



publication, 1977. 



■i 



Wilcox, R. Estimating the likelihood of false -positive and f|lse-negative- 
; . decisions in mastery 4s ting: An empirical Bayes approalh. Journal T 
of Educational Statistics . °1977. in press. • | ' ' "l 



ERIC 



63 



i 



Bstimating the Probability 

V : < 60 _/ 



. Fbotnqtes 

0f 



Michael Sybkoviak and Rand Wilcwc were si^jported re^ctively by National 
Institute of Education (^ants-No. NIE-Q-76ti0088 and NlE-G-76-0083. .Equal 
authorship is iiiplied. - ' <§ % ' . - . 

Maxiimmv likelihood estijnates of ^ and m are also possible (see Griffiths, 
1973). However, when the number of examinees is large, results reported by 
Shenton (1950) >ould seen to inply little difference 'bet^eeji mximum likeli- 
hood estimates and . the 'moment estimates of I and p given here. 



■A 




Estimating the Prob^ility 



Table 1 

Values of , B( 1 ) and , )" for the 



Numerical Example^ . 



X 


0 


BC9'.76+x,13.76-x) 


I 85C^-76+x»13.76-x). 










0 


1 


.12x10"^ 


.99 


1 . 


- 4 


. .95x10"^ ■ \ 


.99, • : 


'2 


6 


' .87x10"'' . 


^ , .99 - \ 


3 ^ 


4 . 


.95x10"'' 


. i 99 


4 


1 


.12x10'^ 


.99 * ' s 



^(9.76,9.76) = .l»clO~"^ 



* f 



ERIC 



as 



■ u 



Furthef Coimients on Reliability for Masfory Tests 
Michael J. Subkoviak 
University of Wisconsin 



/ 



• \ ' o 

Running head: Classification. Consistency for Mastery Tests 



I 



Further Conments 

63 



Abstract 



Thi^apcr illustrates that thp various coefficients of classification 
consistency that have befen proposed as ^neasures of '^reliability for mastery 
'tests have 'different interpretations and statistical propei;ties. As such, 
they shoyld^ not-be applied indiscriminately. Rather, a user should einploy 
that coefficient that is most meaningful within the conjtext of a particular 
problem. 



Further Comments 
' 64 



Further. Conmcnts on Reliability for Mastery Tests * ' * ' 

- - ' * ' " . ' . 

Huynh's recent cr|^ique can be reduced' essentially to two points- (1977). 

First, he argues that the kappa coefficient (k) is preferable to the'gimple 

' \ > . ' • " 

proportion of consistent classifications on tk> tests (J^) as an index, of 

reliability for mastery tests. Second, he notes that the'procedure discussed 

by Huynh .(197^^ for estimating these indicies is matJiematica}.lyJ)incre tractable 

than methods requiring the estimation of a testee^s true ability .{see Marshall 

5 Haertel-, 1976; Subkoviak, 1976)j. , ■ * . - 

Regarding the first point, it is not obvious .that kappa is the only 

index of reliability that a user may wish to consider. .The purix)se of this ' 

paper is to demonstrate that coefficients like ka^pa or the proportion of 

copistent classifications on two tests liave different interpretations and 

different statistical properties . Thus, such coefficients' should not be used 

indiscriminately, Ra^ther, a user should pu^posiv^ly select that coefficient 

whose properties are; most suited to the intended application. - r 

- ' , I quite agree with the seoiild poirit regarding the utility of Huynh's 

procedure. In fac_t., 'sincVHuyrii's method is based'on a simple model for 1 

test scores yfop^S«d^ Kpdts, and "Lord in 1962, similar procedures based' on 

more recent a^id more sophisticated models 'are 'likely .to provide even better 

estimatfcs of reliability (see Lord, 1965, 19,69). > 

\ . Possible RcliabilitY Coefficients f^Mastery tests 
For simplicity, the^llowiri^ discussion is'douched in terms of a mastery' 
test,- in which a single pitting score (C) is useJto classify persons ^mas- ' 
ters or nonmas tors . However , . general i 23.tioii to tdk§^involving 'mul tipLe •cutting''' 



> ' > \ ^ . Fur1i\er Coiranents • 

_-x r" : ," - : ' '65 

/scores -and^lycjiatcimous classification^ is immediate. 

If, a score of C or greater represents mastery on a .test, then, the proper- ► 
tion o!f persons consistently classified^as master/master or honmastet/nonmaster 
on two't^^stings (P^) is,, an indicator of the replicability of mstery-fionmastery 

. outcomes . Goddman and Kriiskal (1954, p, 758) proposed the use of as a ' ' 
reli^mtyrif&ef^^ however, Huynh (1977) "essentially rejects it on the 

grounds thaty.P^ does not assume"* values bfetween 1 and 6,. as do tr^itipnal, ^ , 
norm-re feferfccd reliability coefficients. Rather, assumes values betp^een 
1 and Y^2Siz%^ )Sf?ere ^^^^^^ " ' [P^Cx>C) ^ P^(x<C)] > 1/2. Pfx>C) and PCx<C) 
are the .proportions. of nvisters and nonmasters in the group tested. ^- 

,,Goo(^Jijan and Krushal .counter such^objections as follows: 

- ^'^ . 

* ' . « ' ° ^ ^ ' ; . . . 

Convent! ons/ li^e these [requiring that an index assume values . 

between 1 and 0] have seemed inpoKant to some authors, bu^ ^e^ - 

believe they dim^r^sh in inportance as the irteaningfulness of the 

measure of assopi^ation increases?. One real danger "connected jdth 

such conventions is that the investigatpr. may carry over size ' - . 

preconceptions based upon experience ^with cbnpletely different , ~ — 

measures subject to the same * conventions . (p. 738) ' , 

Goodman and Kruskal conclude that the proportion of coii^istejnt classifications 
on two tests ^ P^, is a pfeaningfjLil and thus valid index of reliability. -Eur- 
thermore., since the-^(^ge of P^ is other than.l to 0, norim-ref^renced' standards^ 
of '^good" ^and "ppor'^ reliability are unlikely to be mistakenly applied to P^. 

For illustrativ^ purposes, a uhitodal distribution of scores (x) for a 
,population on a four item test is"shown in the first two' columns of Table K - 
Using the procedure discussed by Huynh (1976) and also by Keats and Lord (1962), 



.Ftirtheir Conments 
66 



thsse single test administration data can be used to -generate a bivariate 
distribution of scores (x and x^) that would -be expected for two tes^ adminis- 
trations (not show!). This bivariate scatterplot, in turn can be used to com- 
pute Pj,, the propoytion of consistent 'mastery/mastery and nonmastery/nomnastery 

r '^i * • . - * - ' / ' ~ - — ^ 

outcomes on the two tests. ' / / 

: \/, 



Insert. 'Table 1 about -here 

^ , 



Fbr example y if the criterion of mastery is set at' C 4 in Table J, the* 
proportions of consistent mastery/mastery and nonniaEstery/nonmastery outcomes 
in the associated bivariate scatterplot are respectively P(x >.4,-x'' >.4) = .01 
and P(x< 4,x;a< ,4) = .\85, where x and x^ represent scores on different test 
administrations.. Thus, the total proportion of consistent o'ufcomeis when. C = 4 
is = .01+. 85 = .86, as shown in tha third column of Tabte 1. The propor- 
tions of consistent classifications for other values of C in Table 1 are 
.similarly con5)Uted to be P2=.61, 'p2"-61> Pi"v86, and Pq=1,00. ' • 

Notice that the proportion o£ consistent classifications in t^e Pgjcolumo'^^ 
of Table 1 increases as the criterion* (C) moves away from the cen.trai concentra- 



tion of scores at x=2 into either^tail^of thew distribution. In other words, , 
P^'is a U-shaped function of C for this particular set of data. ^ 

Now sig)pos€ one wished to' conpare the observed proportion of consistent ^ 
classifications on two tests, P^, to the proportion bf consistent putcome*s * 

expected if mastery-nonmastery decisions for each student were made instead 

\ If ^ • ... ' ♦ 

by flipping a fair coin twice. ^Sincp the. expected proportion ofr^x^insistent 

decisions in the latter case is onef-half (1/2) , the difference (Pp-i/2) 

indicate^ how much, more consistent /th? actual decision process is than th9 . 

random process just described. In addition, the transformation CBp-l/2)V(l-i/2) 

» * , , •> " ^ — 

« 2FV-1 provides an index that assumes ..values betwedJPl and zero, if m 



desired. If the actual decision process 'is completely 'consistent then 2P^-1 
eqitels 1; if the actual process is no better than the random process, 4±icn 
the index equals 0; generally the value is somewhere^in between. ; 

Values of 21^-1 are shown in thc^fourth coluntfi of Table 1. ^ince 2P^-1 
is a siinjile linear function of P^j7^oth"^coeffi^ents are U-shaped functions 
of. C for this particular data set. However, P^' is alrays greater than or — 
equal to 2Pp-l; so the same standards of "good'^<and ''pdcrf^^ cannot be*appliefd 
•to both. ' ^ ' ' ^ \ ^ . ^ 

Finally, one might again conpare the observed ^proportion, consistent out- 



comes, P(^'^o ^he proportion expected -b^ twice flipping a coin biased accord- 
ing to tfife relative propeTrtions of masters and nonmasters in the groi^i^ested. 
If P(x >.®^an^ P'fx < C) are the proportions of masters .and nonmasters in the 
population tested^ then the expected proportion q£ consistent decisions' in 
this case is P^hahce ' t^^C^^C) + P^(x < C)] > 1/2. Thus,, the kappa coefficient 
defined by k = " E(5^hance^'^^''^*^chance'^ ' indicates how much more consistent 
the actual decision process is thaJi this latter random process (Cohen, 19.60). 
Kappa equals 1 if the actual process is perfectly consistent atnd equals zero 

if the actual process is no better' .than 'rando"^ (iii the latter sense). 

J* * ' . 

For example, if 4. in. Table 1, the^P^^^^ = Z^(2i^C) + P^(x<C) = 

(.08)^ . (.25^.34.. 25^.08) V .§5. Thus. ^« CEcT^chance^/f^-^nce^ = 
(.'86-."85)/(l: .85), = .06. The father values of kappa in^Table 1 vfere. similarly 
computed. Kappa is undefined at C f 0 because the t^tfe (l^Z^j^ncg) in the , 
denominator of kappa equals zero in this case. ' 
^ - Notice in Tafele 1 that kappa, as a function of C ha's^an^ inver ted U-shape-- ' 
jusf the opposite of coefficient^^^^ and 2P^-1. Ol5friously. then,' consistency 

from consistency as measured by or 



Further Comments 

'— ^8- — - 




But what accounts for this di:ffercnce? Tlie answer lies in the tenn 
that occurs in coefficient kaptSa. 
* represents the total, proportion of consistent mastery/mastery and "^non- 
I mastery/nonriiastery decisions that oc|:ur when a ttest is administered to a group 
^ coTiposed of particular percentages of masters a!nd nonmasters. ^ If the group 
is largely composed of eithe^ masters or nonmasters, a major portion of the ^ 
observed consistency^ Pj., is attributable to the groi?) constitution. (This * 
is also^true of 2P|.-1 vhich is .a linear function of For exaj^le, if a 

group is largely made-iq) of masters,^ then consistent. mastery/mastery decisions 
' are -quite likely to occjji*. .htow the term P^j^^^g I 121-^ Z C^i^Q' 
. -Occurs in coefficient kappa represents that portion of the observed consistency, 
that is due t» the particular distribution of mastery and nonmasteiy in 
the group tested.' ThCis, coeff^ient^ kappa, < = C^-Echance^'^^^'^chance^ ' 
prissents the proportion .of- cx)nsistent decisions a'^tributable ^ factors* other 
than group composition, such ks the tesIS itself , "the conditidns of adrainistra- 
tion, etc. -V - >^ - . , 

As siich, the choice of -(or 2^-1) vis-a-vis K^^would seem ''to depenc^t^. - 
iqton whether or np$ the'^user wishes to consider the groiq) as an element of - 
the decision process. Tn other words^ irqne is. interested in the totality ^ - . 
of consistent classifications that, oCcur for a particular combination oif test 
^ and group, then an index like.^ would-'seepi to be appropriate. If one wishes 

to discount the effect o£*gro^> conposition on the decision process, then a 
\ kapp^-like coefficient might be" appropriate. It might be noted that there is r^pC 
. ^an^le precedisnft 'fpr an -index / I ike ^^/that includes group- effects. The 
. ^ ' conventional norm-referenced reliability cbefficient, for, exanple. is v6r>L. • 
imich affec1^'jV)|^e magnitude of trtie score variance' for the particular group 
\ tested: . ' ' , ^ - • , * • 

mc- ■ . r . .72 ( .. ■, ■ . . , 



Further Comnents . 

69 ■ 



San?>ling , Error 

The extent to which numerical estimates of ,an index vary from one saii?)le 
to the next is another matter of some importance, particularly if such estimates 
are ta'sed on rather small, classroom size sanples (N). For 'example, if random 
samples of 50 students were repeatedly drawn froln the population of Table 1 afld 
if, estimates of the three' coefficients (P^, 29^-1, and k) were repeatedly com- 
puted^ for. ^'ach sajiple of test scores, the standard errors of the estimates 
would be approximately. as shown ii\ Table 2. For instance/, if the criterilpn 
^of mastery is C = 4, Table 2 indicates that estimates P«, ZP^-l,^ and k for • 
samples of size 50 have associated standard errors of .05, .10, and .17 
respectively. 



- Insert .Tablec.J about here 



^Comparing the values in Table 2, it is seen that » the standard error of 
kappa'' tends 'to be generalIy<Larger than that of the other two estimators. 




This stems from .tHT fact that kappa, ic - (5:-?chance^/^^"^hahce^ ^ ^^^^ 
of ralidom variables; whereas the other two estimators are not. Thus, it^ 
would-be iJiportant i:o insure adeqtiate sample size, particularly when estimating 

The values of Table 2 were ^obtained using the following approximations to 
the variance of the/throe estimators (Fleiss, Cohen, i§ Everitt, 1969; Hubert, 
1977): -^a) • a^Cy = .[Pcd-ZcM/N, (b) ' o^i2?^-l} = [4P(,(1-Pc)/N] , and (cT 

o\<) =.{j^Pii[l-Echance5 CL-ZiJ tf-^lN Cl-^^^ .|Pij (P.i^P^ 

(Pc^hance-2We*^5'>/fli(l-We5*l^ , " 

Here P^j stanlE^^r the proportion in the ij* cell*" of the 2 x 2 contingency 
table , representing mastciy-nonmastery outcomes on .two ' testings; P. ^nd 




Further Comments 

70^ 



f - ■ ■ - ■ . 

^ I 

I th th ^ * ^ . • 

represent the i row and j column (marginal) proportions of the. 2 x 2 table; 



ahdVhe other symbols are as previously defined* It should be noted that the^e 
fQrmulae are^b^^sed oh tli^^^^junption that a binomial distribution is respon-- 



sible fpr generatong the 2x2 cohfingency table and that the sanple of students 



is reasonably largV (see' Hubert, 1977)*^^' 



Conclusion * 
The basic message of this paper is that the various measures of cla$sifi- 
»^ cation cpnsistency thus far/proposed for mastery tests have different inter^' I 
pretations and statistical properties. Is such, they should not be used 
blindly. Rather, a user should deliDerately select that coefficient that is' 
irost ireaningful within the 'c'on^x^t.of a particular problem. In making 'this 

/ r, 

decision, the user might also wisn- to consider coefficients that reflect . • 
nea^l'y copsistenr classifications (Goodman Kruskal , 1954, p. 758) or 
coefficients that reflect the stability over repeated testing of score devia- 
tions about criterion C (Br^nnSn § Kane, 1977). 

* 



ERIC 



?4 



Further Conment?; 

— , : 



References' 



Brennan, R. L., § Kane, M. T. An. index of dependability for mastery tests. 

Journal of Educational MeVsurement . 4977, in pr^ss. 
Cohen, J. A coefficient of agreemep for nomiiiial scales: Educational and 



jj'sychologi c al Measurement . 1960, 2£, 37 



.1 - 

'r46. 



Fleiss, J. L.,^ Cohen, J. , fi Everitt,,B. S. Large sanrolc standard errors 

of kappa and weighted .^pa,"^ Psycholog-jcal Bulletin . 1969, 72, 323-327. 

Goodman,' L. A., § KruskaU' V/A^^asiW^^soci'ation for cross classi- 
fications. Journal of the Am'^riEan Statistical Association . 1954, 49, 



732-764. . •> ' 



Hubert, L. J. Kappa Revisited .\ Psychological Bulletin . 1977,. in press. 
Huynh, H. Reliability of decisions, in domain-referenced testing. Journal 
• of Educational Measurement , 1976, 13, 253-264. . * # , 
Trt^oih, H. Reliability of- criterion-referenced tests: Comments oii a paper 
- by Subkoviak. / Journal of Educational, Measurement . 1977. submitted. 
• Keats, J. A., § Lord', F. M. A theoretical distribution for mental test 'scores. 
, Psychoisetrika . 1962, 27, 59-72. ■ ' \i 

Lord, F. M. A strong true-score tiieory, with applications. Psychometrika . 

" " ■ ^2 - : — — ~ 

• Vi965, 30, 239-270.. - • , ' « - - 

•rd,'^ P. M. Estimating true-score distributi^^^ psychological (^esdJig^ 
_ ^ (an 'empirical Bayos estimation, problemjj . PsychomSrika . 1969. 3^ 

259-299. . . • ' ' \ . ' ' ' 

■ ■ ■ 

Marshall, J. L. , § Haertel, E. H. The mean split-half coefficient of agreement: 

, A single administration index of reliability- for..7nastei7 test; Unpub- - 

li?hed mamiscript, 1976. (Available from Department of Educational 

Psychology, 1025 West Johnson St.j Madison, Wisconsin 53706). ^ • 




ERIC 75 



Further Comrtients 

72— 



Subkoviak^ M. J. Estimating reliability from a single administration of 

a mastery test. Journal of Educational Measurement , 1976, 13, 265-276, 



f 



76 



Further Cdmments 



. ' 73 

/ 




^ table 1 



Reliabiliiy as a Function of 
Criterion Score for Various Coefficient^ 



X or C 

• / 


r 








■ ' b 


.08 


1.00 


1.00 


undefined 


1. 


.25 


.86 


.72 


.06 




B — r 

.34 


^ .61 


.22 


.11 


3 


.25* 


.61 


^ ,22 . 


.11 ' 




;08 • 


.86 


• . .72 . 


• .06 



Farther Comnents 



74 





' «» 












Table 2 


• 


* • 
• 


Estimates of Standard Erf^ for Sables of* 50 Students . 

• ' , ad' , * • 


V 

> i 


c 


A 








1 


.05 


.10 


.17 * 

> 




° 2 


f '.07 


.14 

* 


44 




• ..'-3 


■ .07 


' .14 


.14 ' 




^ 4 - 




.10 J 


47 




r 








♦ 

« 




/ 




« 


4 


* * 

■ / ' 




' \ 

4 

-< 1 





•1 



Y 



ERIC 



78 . 



Z5_ 



' Confirmatory Inference and Geometric Models 



The University of Wisconsin, Madison 



4^ 



I 



79 ■ ■ 



r 



/ / 



Lawrence J. Hubert 
Michael J . Subkoviak 



76 



Confirmatory Inference and Geometric Models 



< * 



Abstract 



--r-^SonflfSitS'ry method Is discussed, for comparing an outside variable to" a ' 

given geometric model, or alternatively, to the raw data from which the 

t 

•model Is derived. The inference procedure is based on relatively simple 
nonparametrlc principles and, requires the ■ comparlsoVgf "a- proximity vk- 
trl?c' generated from a geometric' representatljan against a setond "structure"' 
ma tilx obtained from the outside variable iitider study. A- number of "exam- 
pies are presented that lllustrateTiow the same statistical approach can 
be applied in evaluating geometric models that arise in a;jiumber of ways, • 

- for Instan-ce; those produced by some explicit data feduct^on pfocess, or 
possibly, models generated by naturally odcurlng spatial contiguity. 



\3 



* 



ERIC 




80 



^^^onfirmatory Infei^ce and 'Geometric Models 



ta analysis ^ver the, last twenty years, a distinc- 
tion between exploratory ai\d confirmatory procedures has become very popular 
(see Kaiser, 1970). Supposedly, an exjfloratory strategy involves the-use of • 
,some~ analysis technique on a given data sfetjrtth the aim of identifying i^er- ' 
eating relationships, patterns, and the like. On the other hand, a confirmatory 
approach TSquirfes the, statement bf a rather -strong a pripri conjecture which *irs 
■ then tested directly 4gainst*fhe available data. It is assumed that these lat- 
. ter hypotheses are derived from a source outside the data actually used for the 
purposes of validation', possibly from t^e results of some previous exploratory 
study. Unfortunately; the- word "exploratory" has gained such 'legitimacy that 
. It now serves too often as a way of justifying isolated empirical "itlidies thaV^ 
an inyestigatcfr has no intention whatsoever .of pursuing" further, -or more seri- 
.ously, as a cover fbi; a superficial theoretical conceptuamation and, a hurried , 

research agenda. ' *~ ^ 

. ,- . " _ ' 

Given the influence of classical ^s£atistics on the conduct of experimental 

• ' ■'. 'f • ^ _ ' 

studies in the behavioral sciences, the trend toward a confirmatory approach to 

researcfi was very strong afCer 'World War IIw More recently, however, inexpen- 

. siye exploratpry computational routines have become widely available, which has 

encouraged some attitude of "fadishness^'- in the analysis of data, irrespective'. 

*^Ci^^l»isher the- methods cKosen are appropriate^ for the problem or colild' possibly 

•lead t^o any increased understanding of the area understudy. In particular. 



% 



ERJC / 81 



\ . .^tJ^_:j:S, ^_ 1 . -1. - 78 



methodology to the novice can easily pe^'more important than sulj'stance^when 
faced with t^e va£t grray of very el,4ant:da'ta reduction techniques, , suc\ as ' 
^ multidimensional scaling,,, cluster ^alysts, . and similar paradigms. Too often, 
the intricacies of a methodology become more crutlal than the original research ' 
question,%nd in fact, even for substantively^^riented investigato*^s, there iit' 
some dangdr^of using data more as a vehicle for displaying a- specific sfeaj^sti- ^ 
cal method vthan as the major reason for which an empirical analysis is carried^' 
out in'' the .first place*. - . . f — . » • 

Although it. may be obviou^ that confirmatory analyses, w^uld be desirable 
as_^n adjunct to many of the current ^j^roaches u^ed in the study of 'proximity . 
Bfetrices (^ch as cluster in«^^.and multidimensional scaling) , 'Very £ew techniques 
have been proposed in the literature, that could help carry .out such a program 
with any degree 'of rigor. .Typically, .usfers of the newer d|ta reduction prpce- . 
dures operate more or less a theoretically, or at best, try to -interpret thei^ 
results in terms of intuitively^ reasonable arguments baied^on outside informar- ' ' 
tion regarding the objects or entities^ being studied. ,I.t*i6 somewhat surpris- . ' 
ing that thite same information whicJi 'cp be invoked in explaining the, results ' 
.of>an analysis "in a post:.-hoc fashion is Hot being used mote directly in' some 
confirmatory manner and without the iwposition^.of an intermediate, d^ta 'reduc-* 
vtion process. , . • . » • 

.! Wi|:h the n#tiyation- in mind of using more fully whatever outside informa- 
tion is available for a* giveA data set," this paper 'will attempt to review/ a 
numbef of approaches to the analysis of proximity ;data based ori. confirmatory 
principles. Although it^is' accepted tlfet explgratory strategies are of in-'' 
terest in generating, ii^ght and "formal hypotheses for later .verification it* ' 
would -be^f value at ;i^same timfi_£o. have a more^^complettf Tet of confirmatory 
procedures that could^ .be us.ed in evaluating or supplementing .exploratory .result's 



in a tigorous fashion. The discussion* to follow is organized into several major 
subsections that iUustrat? how specif ic^ data analysis problems can be^ attacked ^ - 
within a common nonpatametric confirmatory strategy. In particular, to limit ^ 
« the scope of the discussion; our emphasis will be on geometric models, 5r more ' 
specifically., on data representations that have some explicit geometric inter- 
pretatflon. Within this context, our aim is to demonstrate how an obtained . 

geometric representation may be evaluated against available autside informa- 

^ ^ • - ' *' • 

tion, or alternatively; how such representations in some cas^s might be bypassed 

, , , \ • . A ' 

altogether. Since parts of ^ this material are , not entirely new\but are -scattered 

throughput the litefature, appropriate references will be iacluded for the ' 

reader interested in pursuing a mc5re complete presentation ,of the various topics 
introduced."' s - . • 




'Confirmatory Strategies and XSeometric Models 
» '* * 

• ... * ' 
. As a tactic for explaining how a confirmatory approach to data analysis' 

mi£ht be canriedout, a number-of specific problems will*be discussed illustrat- 
ing the necessary concepts in a-concrete manner. In particular, four topic " ' 
areas are ititroSuced in, the sections to follow that demonstrate how a specific 
•confirmatory strategy based* on very simple nonparaiAetric principals may be ap- 

plied in several different ways. Depending on the context, it is conceivable 

* * * * 

tha^ our method might bse used instead of a geometric, model; in extending an 
existing analysis strategy based 'on geometric notions; as a means of inter- 
preting a given^ model with respect fo^ outride information; or finally; as a 
preliminary to the ^construction of a desired geometric representation. "The / 
first example b^w;HEbrmaliz^s the basic ideas to-be. ^sed throughout the papet^, 
and- specif ically, lllusxrar^ how^i* geometric, ^model might -be bypassed a/together 
when strong a priorf. eonjettufes are^-available . , ' * 



v> 



In a recent study concerned with "goodness". of patterns. fciushko. (1975) 
attempted; to verify " Garner's (1962) basic hypothesis regarding what n,akes one 
pattern better than another. To be more' specific, each of the 17 patterns used 
by Glushko, listed in Table 1. can be characterized by the size of an inferred 

■ equivalence class. The term "etjul^alence" Is used to'label tife set «of -patterns 
that contain a single figure pl(,s all other configurations that result from' 
reflections and/or 90 de'grge rigid^rotations. As indicated in Table 1. two of 

:the Glushko patterns construct the same configuration, under. al/of ^^these opera-^ 
tions. 8 patterns have 4 associated figures.' and finally."7 pat.terns prod«^ 
- different members in ,. its class. According to. Garner; pattern goo^^s a 

• direct-.function of the size of ia configuration's inf errgj^ijillence diss. ' 

wi^hghe smaller- size classes (Corresponding to the,^^^^^^^;. patterns. 

' Insert T^Ie'l about here 

To test Garner's hjrpcrthesis using' the 17- figures of Table 1. Glushko. 
first of all. obtained a symmetric measure of ^^oximlty between each 'pair 6f 
patterns using a choice task. Twenty subjects''^^i4 presented all 136 different . 
P>trerri' combinations and were asked to , indicate preference. 'These -choices were / 
then summed over subjects and subtracted from an .expected preference fre.quency 
of 10. The, ab'solute values of these differences, given in the lower triangular 
p/rtion of Table 2,- form a symme'^ic measure of piSximlty. defined for all pat- 
-tern pairs and provide data in a form that can be subjected to variety' of 

r ■ - . ' . 



Insert Table 2 about here 



,data^f eduction techniques. In particular. Glushko at^mpted to' represent the" 



structure of the proximi'ty function by,, fiirst of all, placing the 17 configura-' 
tlons in a two-dimensional space using Kruskal*s (19^4a,b) well^ovm multi- 
dimensional ^Oiing rputine. Given this geometric representation, Joftnson*s 
(1967) diameter clustering results were then -superimposed, * producing a repre- 
sentation similar 'to that we'giye in Figure 1 .(here, we only indicate the 
clustering result defined by three" subclasses) . '. Clearly, one strong dimensfon.. 
(the vertical) can be idenfified as that of equivalence cl'a§3 size. In addition,' 
Che clusters themselves correspond fairly well to a grouping on the basis of the* 
same criterion except tor the minor' misplacement of the two configurations num- 
bered 10 and 11.' >r c_ ^- ' 'i- ; ■ 

A • , '\ ' Insert Figure 1 about here 



The process of verifying Garner 's^hypothesis through a multidimensional st:ai 
ittg ;ai:id clustering seems rather circCiituous, especially since the equivalence .* 
•class hypothesis Implie'g a definite structure for , the priginalVproximity mea- - 
sure. Although the' clustering and scaling results in this case are clear-cut, 
unambiguous outcomes of this type. are rare, ^to say the lea^t. tJnfortunateiy, . 
when a st-rpng hypothesis is not reflected as dramatically in the scaling ar ' 
clusterinr results,, it m^y be difficult to decide whether ,'^e hypothesis is in- 
adequate or, the: data reduction techniques' are at fault.". In**the- tfplcal applica- 
tion.'the researcher, may, be able to identity portions of his theory in a scaling 
or clustering solution but lacks a strategy for measuring in any precis* manner 
the^actual degree ^of confirmation or nonconf i-tmation. ' ' 

As an alternative approach, it should be p'ossibl^e to^test.in a ditect 

^ . ' • * •, - 

manner -whether the pattern' goodness hypothesis is reflected ir^' ^he original 

proximities and bypass the scaling a^ .Clustering solutions altogether. To 

introduce some notation, suppose we denote the patterns as o,,^o' o (where 

... ■ ■* 12 n 



/ .82 



n^'s 17 in our example)/ Further«ore, let ^(o. ,"oi) , refer to the syinetric 
^ . ' , • i J • T 

proximity between patterns o a^ o ,.and 0 to. an organization of these measures 

^ 2 . y . 

^ ^ into a 17 by H square matrix with rows and columns labeled by > the Objects or 

patterns o,.o^^.., o . , By ^co^iventibn, the diagonal of Q is .assUed to consist 

U entirely of zeroes. In addition to the. empirical proximity matrix ^, the stated 

^ hypothesis will be represented numerically -by a. sec^d "structure" matrix c 

• with" elements c(o^,o^) .^^ Explicitly, 'su|.pose N(d^; denotes fthe; ^ze of the- 

inferred equivalency ctass f<^r object o^/'and let f be somTmonotone function 

i ■ on the integers, e.g., f(x) >*f(y) if 'and only if x > y. Then, as a'formal - 

1 . • ■ • ?-~ • » 

,^ definition, . ; . , •. ' ^' • • 

' > . . ' ° »■"•,'• ^ "'. 

c(o^,Oj) = t(|N(o^^)V NCOj)n^'-rO ^ ' •- " • 

_ where it is assumed that 'c(o^,Oj) ="6" for o-^ " Oy Although many functions f' 

^ could be used and the actual choice will depend on the- researcher 's judgment ' ' , 

as to the most appropriate relative size of the structure values,. for the pur- 

. poses of 'an illustration, f is taken as the idefitity, ' i. e'. ' f (x) » x. In ' 

' other words, the symmetri^ function values c(o^,o.), given'in the upper" trian- ' i 

gular portion of Table- 2, ^are merely the absolute values. of the differences in 

equivalence 'clc^ss sizes associated with the objects d' and o . * . , 

i ^ j • ' f ' 

'As an operational interpretation, the theory used t;o. gene'ra;:e the^funcSJn 

c(b^,o^) is given empirical support if i the two set^ of elements c(o ,o ) an! 

■ * . • ' ~. . • • i j • ' 

q(o^rbj> have a similar patterning of-'Wigh an((J.ow- entries. Althoiigh many ' 

- formal in^dices •for-this''-relatibnship courd" be <iefined, the pair ingTof ' a pr6x- ^ ^ ' 

imity q(o^,o ) with a structure value c(o,o) suggests thatthe simple Pearson' "j 

" ,' '/"J.., . 

■product-moment c^orrelation may be.a natura^^.ri^sure to consider; and thus, will ■''"'Vr^V 

^ be our (rtioice foiTthe sequel. Once.-this index is calculated, the next problem / ' '.^^ 

'concerns its significance, and specifdcally, with wheth'er the '.size of tKe , 

served corre^^ion' between the values.f or .q(o^,o ) and c(d ,b ) is su^ficj^t 



t 

to reject some appropriately defined null hypothesis: '"^ ♦ 

, To generate a reasonable reference -distribution for the\ observed correla-^ 
tion, suppose a randomness hypothesis is assumed that hopefully can be rented. 
More specif ically,. it is conjectured that the partition of the objects (or ' / 
patterns) ^i*?^**"*^^ randomly or was chosen at random from the set - 

of all partitions of the sgme form . In our case, tfie conjectured pa^ition 
contains ti^e classes with. 2, 8, and 7 objects ^ji eath, and thus, the null 
hypot!|||is of interest ^asserts that this partlculaiT partition occurred randomly, 
and consequently, does not .reflect, the pa^tei^ning of entries in the proximity 
matrix Q, Moreover, any such-^gartition of Q of the same form (i.e., number of - 
classes) ^^iH produce i^^rfelat'ion index and when completely . enumerated will 
generate an exact re^ence distribution for the assumed "null" hypothesis. 
From , an inference perspective, the observed correlation ^or the conjectured 
"partition can be(compared\to this distribution- and if at a si^tably. extreme 
percentage ^point, the "null" hypothesis of randomness, can be rejected. In short, 
whenever th^ correlatipn Actually obtained for the conjectured partition is 
ylargg ettt^ugh^ithen'tlhis i'fidex can be assumed to reflect ^ value that was ob-t , ^ - 

tained i>pnrapdomly, i,e,, at least' to some extent, the functions q(o ^o ) and , 

. ^ * . ' ^t:-^ ..." ' i 3 ^ 

c(o ,o ). have a common patterning of' high and l6w entrifes. . 

J* \ . ♦ > . 

■ ' % 

Although complete enumeration i%' usually prohibitive because of computa-^, 

tiori^l costs, and thus^ an exact reference distribution is \ypicallyp too ex- 

pensive to obtain, Moatfe '^^•b" approximations are relatiVely inexpensive (cf, / 

.vHuber^ qnd Schi»I^,497>; Schult^ and Hubert", 1976X. For instance, Table 3 presents 

the frequency resists of selecting lO&O^ p^tti'tions of the desired form at random' 

and'wifeh replacement, and sliould proVi'de an 'apprtoximate distribution that is f/^ ^. ' 

fairly accurate for tHfs appl'lcatlSn r iri particular, uslng^ the Table I - ' * ' . 



- " . ' / ■ • . . ' 84 



Insert Table 3 abotit here r-7. 



data', the observed correlation for the Garner hypothesis is .640, whic^. is 
greater than any value obseryed in the Table 3* distribution, TfiuSj the null ' 
hypothesis of a random parU.tion dan be rejected at an approximate significance 
level of •000> suggesting that the equivalence class hypothes^is is supported by 
t^e patterning of theLproximlty values. ^ 

Using tile previous example as a guide, the salient features of a confirma- 

tory/analysis should be evident. Given a proftiinity measure q(o;,oJ and some * 

r J ' > 

conjecture specif ied^ in terms of a structure- funct;ion c(o. ,o ), the observed 

• * * - • • , \ 

correlation between q(o^,o ) and c(o^,o ) is compared to a refelrence d.ist:r4.bui* ' 

tion generated '^ut^er a hypothesis of randomne|^s. If the obtained correlation 

' Vis at an extr^^^'jjgrcentage point, t<he correspondence betwee^jqXPrVP ) and. 

.c(o ,0 )*is declared "significant", with" the, added -.implication that the con- 

jecture leading to the construction at c(o.,o ) may help "explain" some of the 
»- * • * f i J - * 

vari^lon' present in the 6^irica]^^ox^mity ^easur .\ . - 

.Although the example givehrabovfe implies that a randomneis's hypb thesis^ 

should be defined in terms ,of selecting a partition of a given form at 'random; 

"fc * ^ . ** ' - • 

a more general hyi)othefeis can also be considered that will generate exactly ' 

the same- distribution, ExplicltL^if -the values asfeigned by thdj^proximity 

^ 'fu^tiori are organized, as befor^,^ into an n x n sqtjfire ^iatrix Q, and simfla^ly. 



% 



the values of the '^rucEilre. function into a second n x ft square tiatrix C, both - 
•with rows -and columns labeled as o-,o^,./,,,o , then each reordering of the rows 
and simultaneously the corresponding columns Qf g in> relation to 4he' fixed c 
matrix will induce a specif;Lc partition of the n objects o-,f..,o-. In other » 
' wp^^s, for- our C matrlx^^f Table 2, anyi»_reordefing of q produces a partition 



ERIC" . . * -\ , . . . 



defined by subsets containing 2, 8^ 7 objects. -.The first, two rows and 
coluftms of the reordered Q matrix define the objects in the class M size 2, * 
the ne:rtr8 rows and coluhnS dfefine a class of 8 objects, and the^inaining 7 • 
wws and columns define the last objec^ class of..size^. Moreover, if a re- ■ 
oVdering of. 9 is chosen at random, that^s all. n! possible reorderings ar6^ , 
considered equally likely, then this assura\±on induces a random selection of^ 
a partition of the same general form used 1^ the original cohstruoUon ^he . 
C^matjix. In short, th^ random reordering of' g and the random selection .oT a 
partition will generate exactly the same distribution of correlations.) and thus, 
either concept may, be used in producing an approxfmate reference table throu; 
Monte .Carlo simulation. This generalization will prove very important lat^r 
-when a confirmatory .^'^proach is necessary but one that cannot be identified 
'by a specific par tz^i tinning of an object set. 

' >^ Although we suggest' .carrying out a confirmatory test through the use .of 
an ^approximate difftributioii'^btained through Monte Carlo simulation, it is 
also possible jtb find^he exact mean anS variance of the complete reference 
diset;il^utl^liy formula, given "only the ^natrices C and Q.'' Specifically, the 
meah-of the Pearson correlation^ is 0 and its variance . equal to " 



2 2 

'Si . \ (2G^-G2fi:2H^-H2) " * ^ 



\ 



where" 



= I "(I(q(o,.V) - ?))^;^ « I I(q'(o^.o ) q)2. 



4 



86 

\ 

and 



. As an example- of hV^the vilance calculation may be used foj: the data of Table " 
•1 and the structure iunctid^ of Table 2, we find /v(?T = .^0879. Converting to 
a Z-score for the observed correlation of .640, a value of 7728 is obtained, 
which would«indicatea rather signif ican^'result if it were possible'To assume 
"'^ even a crude normality (see Mantel '1967, for the ap^opriate moment derivations).." 
Example 2 ' . , ~ ' J ^ 

The previous-example illustrates how a confirmatory approach might be used ' 
in directly verifying a stated a pyiori conjecture against the original Se^ of 
^ proximities-." As an alternative application of these same principle^, 'it is 

also possible to extend sevef'ai " analysis strategie's proposed in the literature 
^tfiat are basfed 8n naturally occurring geometric models, Most of.' the appropriate ' 

•"references relate to the organization of^a group of objects,' e.g.. people,' cen- \- 
^ sus tracts^ and so on, derived ftom s<^me- notion of geographic- or spatial con-, 
tigu^ty. Within this pontext, one of th^ majot analysis tasks cbncerns the 
association betwe^ spatial contiguity and som.e other variable or variables,' ' 
measured on these same objects. i » ' 

. As a concrete ^example ^from social psycholfc^y, Campbell, Kruskal i^nS" 
' ^^^'^^""^J^^^^^ developed an index Of sea'ting aggregation and an associated .'--"^i^^'* 
^^^^^1^^^!^^^ '_.®^'.^"^l^_'"'®Sy .for determining whether the observed black- 
ly w^iite seating adjacencies within -a classroom might be' considered random. ' Tlie" ' ' 
geometric model in this cast is defined by the occupied seatV within^ clrfss- / 
r .roqm, or more specif ically, by th^ spatial location of the students in a tyo- 
dlmensional plane. The outsid4 variable of^jiterest "is dlchotomous, i.e., - 
black or white, and, thp , inference t^sk.is one of determining whether the spatial 
^ positioning of blacks and w^s" indicates aggregation, e.'g.^ whethe,. blacks, sit ' . 



ERJC 



87 



'with^ blacks and whites sit with whites • * - ' 

- The confirmatory approa.ch" developed in tlje previous "section includes the 

p%C^fflpWll, et al. approach as a special case". In particular, the'set <^f objects 
{ 0^,0 ....,0 } . now refers- to- the set ^ n people and!:q(o ,o ) refers, to some 
m^asure^of spatial distance obtained from the observed seating pattern. Although 
v-ery general measure^, of distance could be used, Campbell', et^ al . consider a^ 
.simple index define^ (in- our notarion) as follows: ' 

if 0^ and are seated - adjacently 
within a single row; 



\ 



( 



otherwise. 



For the Campbell, et al. -application, the structure function c(o^<^^) ,w6uld 
b6 obtaTi^i^from the outside variable of race: 



m c(q^,o^)^ 



if 0^ and o^ are both black or both w^rfte;^ 



4 



Otherwise. 



' Conseguen^tly^ the ' cross-product statistic!, I \(o, ,o Jc(o^ ,oO , Wch forms 
^ £he crux of the correlation coefficient betweep'' the ^f unc^ons_c^(o^7oT^ 
q(o^,Oj), is theVhumber of ^ame-race Wjacencies oBserved in the gi^en seating 



pattern. Using tWs^ statistic, eyl^nce f or aggregation is indicated by/a^ 
Aarge value, or ini our correlationaf context, by a large positive correlation 



b^twe^ 



ein/ q(o£,o^ 



) arijl c(o^,o^). 



ERLC 



Vri 



■ '. ,v 91 



Although the cCoj^jO^llj function used above is rather simple, it can be 
••generalized to iriclude-variables that represent more than a simple dichotomy. 

, .For example, suppose x^,X2 x^ denote some numerical variable attached to 

each of them objects and define' 

'(3) c(o^,o^) = max' {x^,u\,x^} - |x^ - Xj|. . * 

.If . for instance, = 0 when is blark and 1 when is white, then: max " 
{ Xj^,....,x^} =. 1, and this- function is equivalent- to -that given in (2). More 
significantly, if Xj^ denotes, say, the age of person o^',. then this^samQ. ty^i 
■ ftihction could be used -to relate the 'information contained, in' thls^ariable to 
seating pattern, 'in other words, when refers to age, tHen la:;ge positive ' 
correlations between- q(o^-,Oj ) defined as in (1) and c(o^rOj) defined as in (3) , 
yilf suggest' that people of a s>ilar age^it together. Obvioul ly, many other 
functionr^f .the variables \'""''r^^^ ^« "sed for def inin7^(o^,o^ )j and 
also, other notions of spatial^i^taf^'^could be used in defining q(o ,o ). 
What is most importent ^[How^^^s the generkl appropriafc«!f^ss of the Monte 

'Carlo si,|nificance testing strategy and the variance tXi disctfe^ed in the 

* ' - 

■pifevious section. i . ' . - 

^ The concepts presented iS this example .can be Ltended to several ^ther 
situations of:intere^st in social psyeho-logy that hXe been developed in slightl 
different directions. *FoOjj»tance, • suppose the^'obj^ts are now census traces 
and the task is to relate an outside vaj^b^e to /contiguity o'f the 'tracts de- 
•kned geographically ^see Cliff &-Ord, l^f^vy,' 1^1^; Royaltey, Astrachan., 
& So,kal,-1975), The^-same analysis procedurri^Lppropri^J>»-*4^ geographical 
distance defining q(o^, op and the function c(o .rHefined,. sX, as in (3), 
or possibly, by some other more complicated f unpt-iofSi^ f - a \)tiC tot of variables.* 
available for each object. Als>>, the analysis, strategy* for observation^^ con- 
tificted through some genera^ network .structure/e,g. , kinship, can be approached 



89 - 

•in the same. way and related to outside data, e.^flo socioeconomic status. In 
fact, tl)ir§ topic a^ defined by Winstorough, Quarantelli, and Yutzy (1963) is* also' 
a special'case of our confirmatory analysis strategy. . — ' 
Example 3 •■^ 

.GeSraetric configurations that are generated as -a result of a multidimen- 
sional^ scaling (see Carroll-A Chang, 1970; Krusm, 1964a,b) represent yet 
another context in which the confirmatory pafadigm could bfe used to test 
£:E£Mi^ conjectures. Even though these spatial representations are^rpduced ^ 
by an explicit flata.> reduction prm^ss and^onsequently do not arise, naturany 
^ as-^n Example 2, some^ of the saml'hypo thesis testing principles are. still 
appropriate. To give an illustration, consider the application of Carroll 
and Chang's individual differences sc_aling. procedure to data-coll«<tted- b-y - ~— - 
Wish, Deutsch, and Biener (1970). The objects of study for this analysis weref 
12 nations;, and since each of 18 subjects rat^d the T)roximity of all pairs .of 
nations on a nine paint- d'cale^(]^rge numbers indicating a greater degree of ' 
•similarity), tlft resulting 18 proximity matrices „all of size 12 x 12, are 
appropriately analyzed by the Carr ell -Chang routine- £1970) . The group restlit ' 
.fleeted for our discussion is a two dimensional configuration, shown in 
Figure 2, in ^ich each nation is re'jpresented by a point, and where the inter- • 




' ■ " - * • 'Insert .Figure 2«abaut here v , » / 

. ■ » 

^int distances reflect "the degree of similarity between the corresponding ' 
' - • . - . ■ • ' , ' ^ . . - Y 

nations, as judged by the group, e.g., 'the distance -between the U.S. and China ^ 

--is Jafge^siilce they are perceived on the average as being verj^ dissimilar. 

Instead of attempting to label and' interpret dimensions 2er se , suppose 

• the. researcher wishes to test the a priio£f hyjiothesis ..that an outside variable, 

• * • • \ ' ■ . • ' • • f - 



.90 



sucl^as political alignment, Accounts in part for the distances between nations, 
In other words, the researcher is interiested in ponfirming the conjecture that 
nations .close together subscribe to similar political philosophies, and con- 
versely, those far apart have different political systems. In this case the 
proxdmity f unction%o^,o ) would merely" refer to the distance between nations 
and in' Figure 2.\^0r more specifically, if . (y^^ ,7^2^; dgnQtes the numerical 

cedure for, the p^Eint o^ in Figure 2, 
s o. and o in the figure is 



coordinates computed by tS^ Carryoll-Chan 




then the Euclidean distance between any two 
defined as 

As in previous examples, the structure function cCo'^.o^) wpuld be obtained 
^m the outside varialTle of political aligiiment. Fox_fnstance . if -pal-i-tlc^l 




alignment were simply dichotomized a^ communist vs. noncommunist : then c(o ,o ) 

- '0 , i j- • 

might be defined as . , , 



(5)- c(o^,o^) =, 



if o^ and o are both <:ommunlstic or both noncommunist ic; 



Otherwise. 



With this notati 



(A) and c(o^,Oj) 




large positive correlati'^on between q(o ,o ) as aefined in 
ned in (5) wouid, indicate that nations of similar p^^fel 



tical persuasion are located close together in Figure 2." As it turnl^iut 
observeaVr relation between the »interpoint di'stances 'in Figure 2 and the dicho'- / 
tomous variable of political alignment given by (5) is .50, which is jsignif i- ' 
cant- at thB: 000 level (approximately) when referred to the distribution of ^ . 
correlatioris for 1000 random^reorderings of matrix Q. I|i short, there' is con- 
vincing.. statistical support for the hypothesis that political alignment par- ' 
' tially accounts for- the arrangement of points.- .This cWusion is consistent 
•with the descriptive analysis conducted by Wish et al. (1970)', .which identified 



:RIC 



94 



^ • ■ ^ ' ' - . 91 ' 

the vertical dimension of Figure Z as a political alignmfent factor with 
communist nations' at the top. In a similar manner an outside variable such^ 
as gross national produc^ couid be used to support the Wish et al^. contention 
that the horizontal axis of Figure 2 might be identified^ as -"economic develop- 
ment", e/g^, the "more highly developed nations are located' to the right. 
Again, however, our aim would -be to relate the gross national product index^ 
directly to the interpoint distances,, and thus, any explicit dimensional 
representation would be ignored. • C 

In the exampld^given a>ove, -politi<;al alignment was relate^ to the ift^r- 
point distances of/ Figure 2 as derived from a particular data reduction pro- 
cedure. However^, since the distance between' two 'points in Figure 2 ip simply ^ 




graphic rispresentation of the rated similarity between two nations, the" 

political aligh^nt hypothesis could also* be tested directly^gainst subjects* 

raw proximity ratings, thus bypassing Figure 2 and the Carroll-Chang analysis 

altogetheif* For instancp the mean similarity ratings for each pair of nations 

in the Wish et_ al . study are as shown in Table 4, with larger values now 

V 

indicating greater similarity- Thus, if Table 4 is used to define a new \ • 



Insert Table 4 about here 



proximity function q(o^,Oj) and if (5) again represents^ the sti^ctur^ function 
c(o^,o ), then a large negative correlation between q(o" ,o ) and c(o. ,o ) would 
directly indicate that political alignment plays an important' par t"dn the forma- 
tion of subjects' similarity judgments, i.e., nations which have the same polit- 
Txal systems also receive "high similarity 'ratings. T^e actual correlation 
between q(o ,o ) and c(o ,o ) for the^e data<ts •^•J^^ which is 'significant at-' 
ai^ppfoximatie .013 level when ref^red to the distribution of correlations £Qr 
^ IjOOO random permutations of Table 4. • In summary, the political alignment 

ERIC . : . • / 95 ^ ; / ■ ' ^ ■ 



* * • * • , */ ' 

. . .> . 9Z 

•hypothesis can be, jested either against Figure .2 or agaihst/thfe raw proximity 
data, with the latter being somewhat- .simpler and more dlrefet if th6 researcher, 
is willing to sacrifice the allvantage of a pictbrial representatiton. ' ^ ^ ^ 

In addition to the geometric configurat;Lon of nations' given 'in 'Figure 2, . - 
the Carroll-CJiang procedure also produces s configuration- of the particular * ' 
subjects that^supplied the similarity data, as shown 'in /igurtf 3.. The .horizontal 
. ^nd vertical ices of Figure 3 are exactly the same as those of Figur'^ rep- ' 
resenting 'political alignment and economic development, respect;Lveiy;„ NumeVicalv, ' 



V 



t"*,., > Insert Figure' S" aBoiit^lSere ° • ■ ^ < 

coDrdinaj:|s'(w^ w ) again locate a subject o. ^n Figure 3, and furthermore. * 
indicate how ,|ich emiJhasii a subject gives to political alignment and eco^c ') 
• development whlri rating the 8imilaritiesSf4ations"r' For instance," referri^!* ^ ' 

Figure 3,- subject 10 giWs primary emphasis to the .economic deVel<»pment dimension, 
^ subject 11 gives primary emphasis to political align^nen/, and subjects in the" • ^ ' 
, ' fxenter of the configuration weight both dimensions about equally. 

As indicated ^in Figure 3. Wish et al. v. further classiked each .subject - ■ . 
either ks a-ldK (H) , ^oderate (M) , or a dove (D).a^c^r^g the person's stance' 
^ on the Vietpa|par, and descrijjtively Argue that subjects in" the'^ame class . , ^ . 

^tend to weight the two dimensions similarly.' In pther- worxls,- tfince-^t is hy- , ^ ^[ 
pothesized that hawks, moderates, and doves will formjreasonably homogeneiws ' 
clusters in Figure 3, ''the confirmatory paradigm provJtt^rpatistical test for — 
the conjecture, that suhjects'^weight dimensions dif f erenHally according to • ' / " 
their political opinions. Again, the proximity functic^n is defined as thfe ' 
. Euclidean distance l^etween .points o. and o in Figure 3: * ^ * \ 



4 



and for the sake of, simplicity, the structure function is defin)£d as' 



•, :(7.) c(o^,o_j) 



0 if o^ and belong to the same class 
(hawk, moderate, or dove); 

L ' otherwiW, 



A large positive correlation between the function values q(o^,Oj) and c(o^,o^) 
giyen* in (6) and (7) supports the conjecture thaf hawks, moderates, and doves 

tend to form separate .clusters^in Figure 3. Since the observed correlation is 

* - ' . - - ' * 

• 19, which is sighifi*cant at an approximate ,009 let^et, the hypothesis is given 

statistical' support. Wish et al^ . nbtd specifically that! hawks/ tend to clustef 

abov6 the diagonal in Figure 3 .and give relatively more emphasis to the political 

alignment factor; whereas moderates and doves cluster below the diagonal and give 

relatively ^mQi:^^w|fight to economJLc development. - * , 

Although it .should be clear that exploratory analyses such aa multi- ^ 

j dimer^onal scaling may generate a, rich source of hypotheses^ and the conf^nna- 

tory paradigm may provide a useful means for subsequently testing such hypo- 

theses more formally,' a word of explicit caution ii^^also, in order. Specif i- 

^ cally, a hypothesis arising from an exploratory analysis of a particular data 

set should not be tested on the same set of data, since such ^ strategy "^amounts 

to "data snooping" and may produce signif Icl ^j^ ^sults that cannot be r^Vlicatid. 

Typically, if the researcher is .bound by a swingle data set it may stil/^ be 

pos^slhle to take advantage of both explotatory and conf irmatoj:y\analyses by i 

randomly dividing, th^ data in half. Exploratory a'nalyses could then' be- applied* * 

to one half of the data, generating interesting research questions, which could 

be tested on "the second half using the confirmatory paradigm.^ For instance, 

the . 18 si|bj.ects in, the Wish et^al. study could be assigned in equal numbers to 

tWo groups, and the ^^arroll-Chang procedure, applied to the data of one group, 

giving rise to a representation sucb as Figure 2 and the assotiated political 

ERIC ^ : : ^ ^ 97/ y . ' 



.1 

} 



; ' . - ■ 94 

/ alignment conjecture. -The confirmatory paraBigm could then be, applied tp the . 
I ' ' ' ' ' * • ' - . ■» ' V > \ 

/ ' ' • * . 

•y ^ remaining: data, leading to either acceptance or ^ejection of the a priori hy- 

pothesis. 

Example 4 ; * * , ^fc-! ' - 

, The .application of the confirmatory paradigm illustrated in^Example 3 can 
easily be generalized beyond the specific context of multidin^n^ional scaling * 

anc} used to investigate data that is more traditionally cojfe'^ered within an 

V . ■ ■' ' ' . ' " '' " 

analysis of variance' design. As an illu^trationj^s.u^ose^y^^,y^2' • • • iyj^^.) 

represents a profile of r different measurements on an objfe<:t o' . ^ As in^ the 

i ' i • ' 

previous example, o^ can be represented ^as-^^ point in r-dimensional space ' 

.defined by^these numerical coordinates, and furthermore, if the r measures are 

commensurate, or have been made so Vy an appropriate standardized^transformation. 

then a Eucli'dean (or ,x)ther) distance between objects . o* and o ih r-Mimensional ' 
_ ' , , ' . j ■ V 

. ', spade could be used_to define a proximity function of tlie form 



As a numerical example. Table 5 contains simulated profiles for n «='21 phf^cts"^ 
(persbjis) on r = 3 variables [(eTg. , standardized tests) taken from Mielke, 
Berry, & Johnson (1976, p. 141^. The thrfee sets 6f measures are commensurate' 
in the sense that all have the"ssame range, ind thus, may be substituted directly 



' ^' Insert Table 5 about here J .* 

^ — r-r7^< . 4 V - • 

^ , into (8) to obtain the, distance between any object pair o and o • 

The objects of Table 5^ have b'een partitioned into iour distinpt subgroups* 
on the basis of some outside variable, 4.g., f reshman/sophomore/junior/senioFT^ 
and as in the previous examples, this outside variable can be used to define a 
Structure function such as: - \ ' > 

ErJc ■' ^ - -98 ■ 



r 



W c(o^,o ) ~ 



if and o are memBera of the ?ame subgroup;, 
1 * Dt'herwise 



If the confirmatory paradigm is used to test the' hypothesis that the outside 
- variable accounts in pa^t^ for the arrangement of points in r-dim^sional* spaced 
then a'lat:ge positive correlation between 0(0^.0^) as defined in (9) and - 
q(o^,o ) as defined in (8)" would support the conjecture that students in the 
^ame academic year tend to have similar .test profiles, i.e.-, they tend to be 
close together in three dimensional space. In the cas'e of Table 5, this cor- ' 
relation is .55, which is significant at. an approximate .000 level.' Ip other 
words, the cbmparisoiv^of q(o^.,Oj) and c(o^.o^) essentially carries out a multi- , 
variate analjrsis of variance involving four groups and.r measurements" on each 
subj,ect In a group, Furthermore, eV^-n though the outside variably in this 
example is treated as a' simple categorical measur«, it, should be clear from - 
the discugston related to the structure' function in (3) that an analogous •" 
function co&ld'be defihed;for variables measured on higher order scales. In fact 

we could explicitly take the f rtshman/sophomore/ju.nior/senior orderipg* into 

- ' * i ^ * • ' • ' 

account in our illustration. ' • . 

' ' \ ' 

< ,* 

Discussion 



/ 



As shahld be eviderit fn^the examples given above, the confirmatory approach 
developed' m this.paper^s a number of applications related to the use and 
development 6f geometric models, either those that occur, naturally or those ■ 
derive* froi^^e intermeai4ee\ data re^luction process.' In addition to the ■ ^ 
inustratiorts-provided> a.num^^ar of other correspbndences ^lo" the methodological 
llt^fatGre of the behavioral sciences 'could be developed that the reader may be 



. f 



~ Interfestea in pursuing further. For Instance, Carroll and Chang (flote^l) 
suggest a general index of nonlinear cprtAation between two sets of Qbservations 
. Xj^,,',,,5M and ' { y^^,'.,, ,y } def ined'^^By— 

\ ' • ° . - """^ . ; ■ ■ 

' where . " ~ ' 



sl = (l/n) \ iy-y.)\ 
i=l ^ 



and 



1 



w is some decreasing monotonic function of |x -x \. ^ 

Intuitively, the sfeller K is, thp greater the inferred nonlinear correlation. 

2 ^ . -'^ 

Since is constant over all -permutations of the y's, a pennutTation d^Lstrtlbution 

for K can be obtaiited by considering only its numerator; treating the vaiuea> 

2 ^ ^ * ' -r A ^ 

^ij ^ 2 ^^i'^j^ ^ C matrix. As discussed bV Car^pll and 

Chang, itself includes, as special * cases, the well-^knowu correl^ion ratio 

as well as yon Neumanns autocorrelation* statistic,. In ^ct,*- as a second general 

application, a substantial literature in ^^o^ciology exists in ii4ing wliat is 

called the contiguity ratio, which i^/ almost identical in foijn to K, as a way 

\ , ^ ^ ' ' ' ' , ^ ' 

of relating the structure of social networks to ^outside variabiles (e^, , see 

Winsborough, Quarantelli, & Yutzy, 1963; Althausef , Burdick, & Winsborough, 

1966), For a number of other applications of the type' of analysis discussed 

in this paper but\hich are not specifically tied to geometric J5^odeling,;^the 

reader should consult Schultz'and Hubert "^1976),^ Hubert and Balcer (1977), 

Hubert and Levin Cl^76i,b)', Hubert an4 Schyltz (1976.), and- .Hubert (1977),- ^ 



ERIC . • . , .100 



( 



97 



Table 1*' - 



The- Patterns Used by Glushko in Testing Garner's 
Pattern Goodness Hypothesis 



(2) 





Equivalence .class size 

2 ^ k ^ 




1 




8 " • 



(3 



(4) 



(5) 

i 



}) • • • 



(11) 



(12) 



(13) • 




(6) 



(14) 



(7) 



(15) 

I 



r 



(8) 



(16) 



t(9) 



• - 



(l/l ■ 



(10) 




4 ' 


* 
















2 • 




i 

1 


V 

ft 






- 














98 




■ / .' 


' - • / . Table 2 • ; ■ 

The Symmetric Proximity Mat1:l)Sc Obtained by Glushko. for-the Patterns of Table 1 ^ ' 
.(Lower Triangle) and the Structure Matrix Generated by the " ^ • * ^ 
^ Equiv^lrence Class Hypothesis (Upper Triangle) , ' • . * 
















N 

> 














Pattern 


1 2 3 


4 


^ 

5. 6 7 

> 


8 


9 10 11 


12 


13 


14 


15, 


16 


17 ■ 




1 


X 0 3 


3 


3' 3 3 




337 


7 


'.7 . 


7" 


7 


7 


7 




2 

37 . 


.1 *X ' 3 

-1 2 X 


3 , 
0 


3 3 3 
0 0 - 'm) 


0 


3377 

0 0 4^ 4 


4 -4 


7 
4 


7 
4 


'7. [ 
4 




5 


2 4 0 

3 '3 


X 

1 


0 0 0 
X 0^0 


0 

V/ 

0 

V/ 


0 0 4" 
0 0 4 


4 
4 


4 . 
4 


4 

' 4 
4< 


4. 
4 


4 

4^ 


- 4 
4 


_ 


6 


2 4 1 


1 


•1X0 


0 


0 0 , "4 


A. 


4 


4 


4 


4.;r4 




7 


2 4 3 


2 - 


1 2 ' X 




-O- ' 0 4 


4 


4 


4 


4 


4 


4 




8" 


*? 5 • 2 . 


1- 


2 '1. *o 


X * 


0 "0 4 


4 


'4 


4 










' 9 


4 4 2 


1 


5 3 3 


4 


X. ^0 4 


4 




4 




4 


4 




HO 


4 ;5 .4 


4 


3 3^ 3 


5 


4X4^ 


4 


4 


4 


4 


■ 4 


4 ■ - 


XI 


^553 


4 , 


3, b. . 2 


3 


1 1 x' 


' 0 


0 


0 


0 


0 * 


0 


> 

! 

\ 


12 
13 


5 '6 4 6 
'.6 7 7 ^6 


A 1. 5 
5 4" 5 


5 

/ 


111 

5*^ 1 4 


X 

-1 

— J 


0, 
X 


.0 
0 


0 
0. 


0' 

_0 


° ■ ■ N- . 

0 V - 




14. 

/ 

I 


7" 6/. 4 


4 


5 4.6 




4 2 4 


1 


1 


X , 


0 


0 


'0' C - _ 




is 


6.7 5 


7. 


4 5 5- 


4 


5 a- 3 


0 


0 


1 


X 


0 


0 ' r : 






7 8j' 5 


5 


6 4.4 


3 


4 1 '4 


2 


2 


0 


1 


X ' 


0 . 




17 


7 7 5' 


5 


5 6 5 


4 


6 ' 3 6 


-2 


3 


1 


1 - 




. X . 








> 


I 


















* \ 






























; U ■■ ■ 

J 




102 










1 




J 

t 





























Table 3 . 



Approximate Distribution for the Comparison' of" the 
Structure and yroximlt^ Matrices Given in ■ 
, Table 2 (SampleySize of. 1000) 



Correlation ■ Sample Cumulative -Proportion 



-.193 
> -.171 
-.162 
-.117 
-.098 



-.009 



.033 
.068^ 
.115 
.162. 
.273 
.297 
,396 



.001 
.005 
.010 
.050 
.100' 



-•070 . ~ . /, -r^o 

-•0^6 '. ; . . .'300 

-025 . ' . . ,400^ 

.500 



.700 
-.800^ 
.900 

.950 

\ 

,990 
.995 
.999 



,420 * 1,000 



100 



Table 4 



2 



3 
4 
.5 

6 
7 
8 
9 
10 
11 
12 



Mean Similarity Ratings for 12 Nations 

. . . ■ ; 



a 



Brazil 
Congo 
Cuba ^ 
^Egypt 
France 
• India 
Israel 
Japan ' 
. China 
Russia • 
U.S. 

Yugoslavia 



4. S3 
5; 28 
3T44 
4.72. 
_4.50 
3.83 
3.50 
2.39 
3.06 
5.39 
^3.;L7,. 



4.56 

5.00 

4.00' 

4.83 

3.33 

3.39 

4.00 

3.39 

2.39" 

3.50 



5.17 



4.11 

I 

4*. 00 



3.ei 

"2.94 
5 . 50 
5.44 
3^.17- 
5.11 



4.98 
'5.8a 
4.67 
3.83 
4. "39 
,4.-^9 
3.33 
4.28 



3.44. 
4.00 
4.22 
3.67 
5.06 
5.94 
4.72 



, 6 



4.11 
4.50 
4.11 
4.50 
4.28. 
4.00 



A. 83 

3-.00 

5.94 
4.44 



10 



11 



4.17 
4.61 
6.06, 
4.28^ 



\ 
5.72 

2.56 

5'. 06 



5, 
6, 



oa^ 

67 



3.56 



1 — r 



Larger numbers indicate greater similarity 



ERLC 



Figure Captions 

Figure/1. Two-dimensional scaling of the Glushko 'patterns 
.^igu'r^ 2. Two-dimensional configuration af 12 nations. 
Figure ,3. Two-dimensional configuration of 18 subjects. > 



V 

f 



' • 1 



\ 



106 



J 



Reference Note 



1. ^ C^jyoll, J. P. & Chang, J. J. A general . index, of Rofillnear correlation ' ; 
an4 its applicatfo-h to the problem (M relating physical and' psychological 



nsions. . Unpublished manuscript available .fr9m Bell Telephone Labora- 
(fbxies, Murray Hill, Ne^ Jersey (no date). • ' < 



110 



■ ■ . • 107 • ' 

* ' ■ ' References \< ' . 

Aithauser, R. P. . Burdick, D. S^^ & Winsborbughf H. The standardized con- 

..tiguity ratio. Social 'Forces . 1966, 45, 237-245. « 
Campbeli, D. T., Kruskal, H., & Wall^e, W.' P. .Seating aggregation as an 

index of attitude. Sociomet-ry . 1966. 29, 1-15:. ' . ^ 

Carroll, J. d. , & chang^ J. J.' Analysis of, individual differences in multi- 
. dimensional scaling via an N-Way generalization of "Eckajl^oung". decom- 
position, Psychometrika . 1970, 35, 283-319. 
Cl±U, k. D., &Ord,.J. .K. Spatial autocorrelation . London: pW," 1973. 
Garner, wC R. Uncertainty and s.tructure^ as -p sychological conr^p V. . ^„ York 

■ Wiley, 1962. ]'•*■■ ''-r • 

Geary.. R. C. The contiguity raVo and statistical mapping. The Incorpora/ .H 
Statisticiap . 1954,. 5, 115-141. ' ^ . 




Glu^ko. R. J. Pattern goodness and -redundancy revisited: Multidimensional 

scaling and hierarchical clustering analyses. Perception and PsVcho- v^. 

• £hzsics, 1975, 17, 158-162. , * " ^ 
Hubert, U J. Nonpar&metric tests for patterrts in 'geographic variation:, I"' 

Possible generalizations. Geographical Analy sis. 1977, in "pres^^^ 
Hubert, L. J;, & B9ker,F. B. ' Ar^lyzing distinctive f eatures . " Jou^'^p-Qf 

Educational Statistics . 1977, _2, 79-9^8^ . • 

.Hubert, L. J. , & Levin, J. .R., Evaluating c^^ject set pfjrtition^ Free-sort 

analysis and. some generalizations. Jovlmal of . Verbal (Learnine and Verh.1 

• Behavior . 197-6, 15, 459-470, (a) 
Hubert, L. J.; & Levin, -J. R. A general statistical- framework for assessing' 

.-^tegorical clustering, in free.recall. . Psychoiogicll Bull.t^n , 19^6, -83, 
J %&72-1080. (b) ' " ' ' ' 

■ ' 111 ■ . ■ . .• 



, ' * * > 

Hubert, L. J., & SchulCz, J. V. Quadratic assignment as a general data analy- 
' ^ sis strategy. • The 'British Journal of Mathematical and Statistical Psv- • 

choloev . 1|76, 29, 190-241. , • . 
Johnson, S. C. Hierarchical clustering schemes. Psychometrika .^ 1967. 32, 

241-254. \ • • . ■ 

Kaiser, H. F. A second generation- little jiffy. Psychometrika . L970, 35, 
401-415. ' . " 

Kruskal, J. B. Multidimensional scaling:. "7^ numerical method. Psychometrika . 

^1964^ 29, 1-27. . (a) " ~ - " . 

Kruskal, J. B. Multidimensional scaling by optimizing goodness of "^it to a • 

. tionmeti^Q hypothesis. ' Psychometrika . 1964, 29^^,_115-129. (b)> 
Mantel, N. The dete|tionfOf disease clustering and a ^generalized regression' ' 
approach. Cancdy Research . 1967. 2^^. 209^220. I ~ ^ 

Mielke, P.^W. , Berrjr, k\ J.,;& Johnson, E. S, Multi-response {)ermutation prto- 

'cedures for a pT«bri\classifications. Communicktions in Statistics— 
r- Theory and Methods . 1976, 5, 1409-1424. ' " • . .\ ' 



Royaltey, H., H., Astrachan, e\, & Sokal, R. R. Tests for patterns in geo- , 

graphic variation. Geogr^hical Analysis . 1975, 2» 369-395. 
Schultz, J, v., & Hubert, L. 4. V ,nonparametric test for the correspondeftce 
^ between two proximity Tna^rfce^. • Journal of Educafjohal Statistics . 1976, 
1, 59-67. \ ' 

Winsborough, H. H. . Quarantelli. E. 1*, & Yutzy. D. ^The similarity of connect^! 

"observations. African' Sociological Review . .1963, 28,* 977-983; 
Wisiy, M. Deutsch, ^1. , & Biener, L. Differences in conceptual structured of 
nations.-. An exploratory study; Journal of Personality and Social Psvch- 
ology, 1970, 16, 361-373'. ' " 



109 



J 



Footnote 



Equal- authqrship is implied. Lawrence- Hubert and Michael Subkoviak ^i'^' 

supported respectively by grants from the National Science Foundatioti (SOC 75- 

^ : " , i 

07860) and' the' National Ins t^y^ite of Education XNIE-G-^ 

* " Requests 'for reprints should be addles sedSf^Lawrence Hubert-, Department 
of Educational* Psychology, The Univ\c^ty of Wisconsl^n, Madison, Wisconsin, 537 



\ 



X 



V 



♦ 



\ 113 



